Abstract
Transcriptome sequencing data offer a valuable resource for inferring genetic variants, yet their application in forensic individual identification and kinship analysis remains insufficiently explored. This study analyzed an open-access transcriptome sequencing dataset comprising 731 individuals from five continental populations. We obtained a total of 5,863,540 transcript SNPs (tSNPs) across these individuals. By comparing these with SNP genotypes obtained from whole-genome sequencing data, we observed that transcriptome-derived genotypes exhibited high reliability, achieving up to 99% concordance with paired genomic SNP genotypes. Based on this, we meticulously selected 735 polymorphic tSNPs characterized by high heterozygosity (Het ≥ 0.4) and low frequency variation across different populations (Fst < 0.06). The global mean match probability of these tSNPs was calculated to be 10^-305, rendering them a promising candidate set for individual identification. Validation in an independent population demonstrated strong detection stability for this locus set, with an average detection rate of 98.34%. Forensic genetic parameters were highly consistent with those of the original screening set, confirming its robust population portability. Furthermore, we evaluated the system power of 735 tSNPs in identifying various kinship relationships through the application of the likelihood ratio method. The findings indicated that, at cutoff thresholds of t1 = 4 and t2 = -4, these tSNPs could effectively distinguish parent-offspring, full sibling, and second-degree kinship pairs from unrelated individual pairs, with system powers of 1, 1, and 0.9863, respectively. This suggests that transcriptome data holds significant potential for forensic individual identification and kinship analysis. In addition, due to its ability to concurrently detect gene expression levels and nucleotide sequences, transcriptome profiling could be employed for diverse forensic genetic applications, including the simultaneous identification of body fluids and donors. Through the analysis of five forensically relevant body fluids/tissues, we ascertained 196 stably detectable core tSNPs. By integrating these core loci with tSNPs located on body fluid-specific RNA markers, we characterized a set of loci with the potential for both body fluid and individual identification. Additionally, tSNPs obtained from transcriptome data show promise for phenotypic prediction, biogeographical ancestry inference, and forensic genetic genealogy. In conclusion, our study provides essential evidence supporting the utility of transcriptomics in forensic genetics, thereby establishing a foundation for the increased integration of RNA-based evidence in future forensic methodologies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-026-12782-z.