Fully automatic extraction of morphological traits from the web: Utopia or reality?

从网络中提取形态特征的全自动方法:乌托邦还是现实?

阅读:1

Abstract

PREMISE: Plant morphological traits, their observable characteristics, are fundamental to understanding the role played by each species within its ecosystem; however, compiling trait information for even a moderate number of species is a demanding task that may take experts years to accomplish. At the same time, online species descriptions contain massive amounts of information about morphological traits, but the lack of structure makes this source of data impossible to use at scale. METHODS: To overcome this, we propose to leverage recent advances in large language models and devise a mechanism for gathering and processing plant trait information in the form of unstructured textual descriptions, without manual curation. RESULTS: We evaluate our approach by automatically replicating three manually created species-trait matrices. Our method found values for over half of all species-trait pairs, with an F1 score of over 75%. DISCUSSION: Our results suggest that large-scale creation of structured trait databases from unstructured online text is now feasible due to the information extraction capabilities of large language models. However, the process is currently limited by the availability of textual descriptions that cover all traits of interest.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。