Abstract
Protein structures are fundamental to understanding biological function, yet many detailed similarities remain hidden from conventional alignment-based or 3D superposition methods. Triangular Spatial Relationship (TSR) offers an alignment-free encoding of backbone geometry; however, classical TSR ignores the context of secondary structure elements (SSEs), such as helices, strands, and coils. To address this, we introduce SSE-TSR, which enriches each TSR key by categorizing it into one of 18 helix-strand-coil combination labels derived from DSSP-style annotations in PDB HELIX/SHEET records. By mapping the protein representation involving SSE-TSR keys into a sparse tensor, SSE-TSR compactly captures both tertiary geometry and local secondary motifs. We evaluated SSE-TSR on four datasets, two structural (CATH-based, 9.2 K; SCOP-based, 7.0 K) and two functional (published, 7.8 K; new, 7.2 K), using a 3D convolutional neural network. On structure-based tasks, SSE-TSR noticeably boosts accuracy from 96.00% to 98.33% (CATH-based) and from 95.46% to 99.00% (SCOP-based). On functional tasks, it yields modest yet consistent gains (e.g., from 99.41% to 99.50% and 95.83% to 98.83%). Comparisons to Foldseek confirm competitive accuracy across diverse tasks. Additionally, the sparse tensor representation enables memory-efficient handling of large-scale datasets, making SSE-TSR practical for extensive bioinformatics analyses. These results demonstrate SSE-TSR as a scalable, interpretable, and robust method, enhancing protein classification and structural bioinformatics.