Abstract
Stop codons dictate translation termination, and variants occurring at these sites can result in stop-loss variants, leading to C-terminal extensions with significant functional consequences. Despite their clinical relevance, existing prediction tools, primarily developed for missense variants, lack sufficient accuracy for assessing stop-loss variants, mainly due to inadequate consideration of the sequence features of the extended peptide. To address this gap, we developed TAILVAR (Terminal extension Analysis for Improved prediction of Lengthened VARiants), a machine-learning classifier that integrates multi-omics features spanning transcript- and protein-level properties, along with variant effect annotations. Our analyses showed that transcripts lacking downstream stop codons in the 3' untranslated region exhibit lower evolutionary constraints. Additionally, deleterious variants exhibit greater C-terminal hydrophobicity, which is associated with reduced protein stability and increased degradation, as well as a higher aggregation propensity. TAILVAR outperformed existing benchmarks, demonstrating the highest correlation with functional experiments and enabling reliable interpretation of functional impacts as benign or pathogenic-mostly via loss-of-function mechanism, whereas discrimination between gain- and loss-of-function effects requires further investigation. This work presents a systematic framework for interpreting stop-loss variants, offering precise predictions of elongated protein effects that may aid in genetic diagnosis and facilitate the discovery of novel disease-associated genes.