Abstract
Functional classification of intrinsically disordered proteins (IDP) is a challenge due to their low sequence homology and lack of stable tertiary structure. We embrace this challenge to classify a model system of two IDPs─NCBD and CID─that have coevolved and for which both ancestral and extant sequences are available, along with quantitative binding data. One of these sequences, NCBD, exhibits partial secondary structure, while the other (CID) remains highly disordered and is highly charged. We classify these sequences using statistical physics-derived sequence-dependent interaction maps that predict distance maps (ensemble average distances between arbitrary residue pairs). We also use sequence-specific dynamic profiles for further comparison. Our findings show that CID proteins can be classified into two major groups due to two distinct types of patterns in their electrostatic interaction maps. Classification of CIDs using nonelectrostatic patterning yields diverging predictions, illustrating the importance of accurately modeling long-range electrostatic interactions. Conversely, the classification of NCBD sequences generally reaches a consensus when physics-based noncharge patterning metrics are applied, along with the dynamical profiles. Furthermore, we used these sequence-dependent metrics and dynamical profiles to quantitatively model the binding affinities between the two IDPs. Surprisingly, we find that multiple physics-based sequence metrics quantitatively recapitulate the binding affinities between CID and NCBD variants, linking sequence composition and patterning to their emergent function. This integrated framework provides a generalizable strategy for classifying IDPs and predicting complexation behavior, offering new avenues for probing sequence-function relationships in disordered protein systems.