To join or not to join: handling biological replicates in long-read RNA sequencing data

加入还是不加入:长读长RNA测序数据中生物学重复样本的处理

阅读:2

Abstract

Long-read RNA sequencing (lrRNA-seq) has revolutionized transcriptomics facilitating the study of alternative splicing and resulting in identification of thousands of novel transcripts. While isoform identification has received significant attention, the handling of biologically replicated lrRNA-seq datasets remains less explored. However, how multiple samples are combined in a lrRNA-seq study may strongly impact transcript identification. This study defines and evaluates two strategies for obtaining consensus transcriptomes from multi-sample lrRNA-seq data: "Join & Call", where reads from all samples are combined before transcript identification, and "Call & Join", where transcript identification is performed on individual samples before combining the resulting annotations. We applied these strategies to a highly replicated dataset of mouse brain and kidney tissues, using both PacBio and ONT technologies, across six widely used transcript reconstruction tools. Our results indicate that the optimal strategy depends on the chosen computational tool and research objective. We found that Join & Call is generally more suitable for discovering rarely occurring, novel isoforms, as pooling evidence increases confidence in calling lowly-expressed transcripts. Conversely, Call & Join is computationally more efficient and often preferable for highly replicated datasets when the investigation of rare novel transcripts is not the primary objective. Our findings provide a conceptual and practical framework for multi-sample transcriptome reconstruction, guiding best practices in the context of increasingly large-scale lrRNA-seq studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。