Single-cell omics arena: evaluation of large language models for automatic cell-type annotations on single-cell omics data via RNA-seq bridging

单细胞组学领域：通过 RNA-seq 桥接评估用于单细胞组学数据自动细胞类型注释的大型语言模型

阅读：2

作者：Liu,Junhao,Xu,Siwei,Wu,Yongxian,Zhang,Jing

期刊：	Briefings in Bioinformatics	影响因子：	7.700
时间：	2025	起止号：	2025 Nov 1;26(6)
doi：	10.1093/bib/bbaf622	研究方向：	细胞生物学

Abstract

The single-cell sequencing revolution enables simultaneous molecular profiling of various modalities across thousands of individual cells, allowing scientists to investigate the diverse functions of complex tissues. Among all the analysis steps, assigning individual cells to specific types is fundamental for understanding cellular heterogeneity. However, this process is labor-intensive and requires extensive expert knowledge. Recent advances in large language models (LLMs) have demonstrated their ability to automatically extract biological knowledge, such as marker genes, promoting efficient, and automated cell-type annotations. To evaluate the capability of modern LLMs in automating the cell-type identification process, we first introduce an automated cell-type annotation method with comprehensive benchmark: Single-cell Omics Arena). Specifically, we began by compiling 11 publicly available single-cell RNA sequencing (scRNA-seq) datasets and evaluating eight LLMs across 1226 cell-type annotation-related tasks. This effort established a foundation for automated cell-type annotation from scRNA-seq data using interpretable features such as gene names. Building upon this benchmark, we introduced domain-specific chain-of-thought prompting techniques to enhance the accuracy of cell-type annotation and facilitate the extraction of relevant biological insights. Finally, to accommodate non-interpretable features, we proposed to leverage a pretrained VAE-based cross-modality translation module to convert features such as epigenetic marks into interpretable representations, which enables the seamless extension of LLM-based cell-type annotation to non-RNA-based sequencing technologies. In summary, our benchmark provides key insights into automated cell-type annotation from scRNA-seq data and demonstrates the potential of cross-modality translation for handling non-interpretable features.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。