Abstract
Intrinsically disordered regions (IDRs) in proteins drive phase separation (PS) to form biomolecular condensates, which organize cellular matter. While IDRs are recognized as critical drivers of PS, the systematic identification of sequence motifs governing this phenomenon and their compositional determinants remain a key challenge. Here we develop PhaSeMotif, a deep learning framework for interpretable and precise predictions of essential phase-separating motifs within IDRs. We experimentally validate PhaSeMotif, demonstrating that mutations of predicted motifs significantly reduce or eliminate the PS capabilities of IDRs. The identified motifs possess diverse amino acid compositional features that are critical for determining PS propensities and condensate partitioning. Furthermore, PhaSeMotif integrates generative models to create validation-ready motifs that preserve these critical compositional features, empowering direct experimental verification and deeper mechanistic investigation of PS-driving IDR motifs. Overall, by combining motifs prediction, generation, and validation, PhaSeMotif provides an open-access toolkit to facilitate more efficient IDR motifs investigation and provides insights into the molecular determinants of PS.