Abstract
Protein-nucleic acid interactions play a crucial role in biological processes, including gene regulation and editing. Accurately identifying nucleic acid-binding domains in proteins is essential to unravel these interactions, yet traditional experimental methods like X-ray crystallography remain costly and time-intensive. Computational approaches have thus emerged as indispensable tools to complement wet-lab techniques. Here, we introduce a framework for nucleic acid-binding domain prediction by integrating cross-modal protein language models with a multiscale computational architecture. The proposed method leverages a structurally annotated benchmark dataset, which quantifies binding likelihood through hierarchical, proximity-based labels derived from experimental complexes. Evaluations demonstrate that the approach achieves state-of-the-art performance, providing a new insight into the design of multimodal learning systems in protein-nucleic acid interaction analysis and an open resource to accelerate discoveries in functional genomics and drug design.