Novel text analytics approach to identify relevant literature for human health risk assessments: A pilot study with health effects of in utero exposures

一种用于识别人类健康风险评估相关文献的新型文本分析方法：一项关于子宫内暴露对健康影响的试点研究

阅读：1

作者：Cawley,Michelle,Beardslee,Renee,Beverly,Brandy,Hotchkiss,Andrew,Kirrane,Ellen,Sams,Reeder 2nd,Varghese,Arun,Wignall,Jessica,Cowden,John

期刊：	Environment International	影响因子：	9.700
时间：	2020	起止号：	2020 Jan;134:105228
doi：	10.1016/j.envint.2019.105228

Abstract

Systematic reviews involve mining literature databases to identify relevant studies. Identifying potentially relevant studies can be informed by computational tools comparing text similarity between candidate studies and selected key (i.e., seed) references. Challenge Using computational approaches to identify relevant studies for risk assessments is challenging, as these assessments examine multiple chemical effects across lifestages (e.g., human health risk assessments) or specific effects of multiple chemicals (e.g., cumulative risk). The broad scope of potentially relevant literature can make selection of seed references difficult. Approach We developed a generalized computational scoping strategy to identify human health relevant studies for multiple chemicals and multiple effects. We used semi-supervised machine learning to prioritize studies to review manually with training data derived from references cited in the hazard identification sections of several US EPA Integrated Risk Information System (IRIS) assessments. These generic training data or seed studies were clustered with the unclassified corpus to group studies based on text similarity. Clusters containing a high proportion of seed studies were prioritized for manual review. Chemical names were removed from seed studies prior to clustering resulting in a generic, chemical-independent method for identifying potentially human health relevant studies. We developed a case study that focused on identifying the array of chemicals that have been studied with respect to in utero exposure to test the recall of this novel literature searching strategy. We then evaluated the general strategy of using generic, chemical-independent training data with two previous IRIS assessments by comparing studies predicted relevant to those used in the assessments (i.e., total relevant). Outcome A keyword search designed to retrieve studies that examined the in utero effects of environmental chemicals identified over 54,000 candidate references. Clustering algorithms were applied using 1456 studies from multiple IRIS assessments with chemical names removed as training data or seeds (i.e., semi-supervised learning). Using a six-algorithm ensemble approach 2602 articles, or approximately 5% of candidate references, were "voted" relevant by four or more clustering algorithms and manual review confirmed nearly 50% of these studies were relevant. Further evaluations on two IRIS assessments, using a nine-algorithm ensemble approach and a set of generic, chemical-independent, externally-derived seed studies correctly identified 77-83% of hazard identification studies published in the assessments and eliminated the need to manually screen more than 75% of search results on average. Limitations The chemical-independent approach used to build the training literature set provides a broad and unbiased picture across a variety of endpoints and environmental exposures but does not systematically identify all available data. Variance between actual and predicted relevant studies will be greater because of the external and non-random origin of seed study selection. This approach depends on access to readily available generic training data that can be used to locate relevant references in an unclassified corpus. Impact A generic approach to identifying human health relevant studies could be an important first step in literature evaluation for risk assessments. This initial scoping approach could facilitate faster literature evaluation by focusing reviewer efforts, as well as potentially minimize reviewer bias in selection of key studies. Using externally-derived training data has applicability particularly for databases with very low search precision where identifying training data may be cost-prohibitive.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

囊泡

血管生成

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

炎性小体

肿瘤微环境

组蛋白修饰

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

相分离

治疗耐药

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

铁代谢

细胞极性

氨基酸代谢

cGAS-STING

碱基编辑

蛋白降解

肠脑轴

翻译调控

乳酸化

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

溶酶体功能

低氧缺氧

琥珀酰化

细胞干性

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

器官纤维化

表观遗传记忆

铜死亡

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

丙酰化

MAIT 细胞

肠肝轴