Accurate gene annotations are fundamental for interpreting genetic variation, cellular function, and disease mechanisms. However, current human gene annotations are largely derived from transcriptomic data of individuals with European ancestry, introducing potential biases that remain uncharacterized. Here, we generate over 800 million full-length reads with long-read RNA-seq in 43 lymphoblastoid cell line samples from eight genetically-diverse human populations and build a cross-ancestry gene annotation. We show that transcripts from non-European samples are underrepresented in reference gene annotations, leading to systematic biases in allele-specific transcript usage analyses. Furthermore, we show that personal genome assemblies enhance transcript discovery compared to the generic GRCh38 reference assembly, even though genomic regions unique to each individual are heavily depleted of genes. These findings underscore the urgent need for a more inclusive gene annotation framework that accurately represents global transcriptome diversity.
Long-read transcriptomics of a diverse human cohort reveals widespread ancestry bias in gene annotation.
阅读:9
作者:Clavell-Revelles Pau, Reese Fairlie, Carbonell-Sala SÃlvia, Degalez Fabien, Oliveros Winona, Arnan Carme, Guigó Roderic, Melé Marta
| 期刊: | bioRxiv | 影响因子: | 0.000 |
| 时间: | 2025 | 起止号: | 2025 Mar 17 |
| doi: | 10.1101/2025.03.14.643250 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
