Abstract
Long-read RNA sequencing (lrRNA-seq) has revolutionized transcriptomics facilitating the study of alternative splicing and resulting in identification of thousands of novel transcripts. While isoform identification has received significant attention, the handling of biologically replicated lrRNA-seq datasets remains less explored. However, how multiple samples are combined in a lrRNA-seq study may strongly impact transcript identification. This study defines and evaluates two strategies for obtaining consensus transcriptomes from multi-sample lrRNA-seq data: "Join & Call", where reads from all samples are combined before transcript identification, and "Call & Join", where transcript identification is performed on individual samples before combining the resulting annotations. We applied these strategies to a highly replicated dataset of mouse brain and kidney tissues, using both PacBio and ONT technologies, across six widely used transcript reconstruction tools. Our results indicate that the optimal strategy depends on the chosen computational tool and research objective. We found that Join & Call is generally more suitable for discovering rarely occurring, novel isoforms, as pooling evidence increases confidence in calling lowly-expressed transcripts. Conversely, Call & Join is computationally more efficient and often preferable for highly replicated datasets when the investigation of rare novel transcripts is not the primary objective. Our findings provide a conceptual and practical framework for multi-sample transcriptome reconstruction, guiding best practices in the context of increasingly large-scale lrRNA-seq studies.