Abstract
Sequence-only surveillance of rapidly evolving pathogens must extract clinically meaningful structure from protein sequences without labels, central data pooling, or strong assumptions about data homogeneity. Most existing sequence autoencoders either assume centralized, IID data or rely on heavy cryptographic protocols; in federated deployments they can leak geometric information through latents or gradients, suffer from client-specific rotations and sign flips of the latent basis, and ignore curvature of the latent manifold, which together degrade clustering quality and make privacy guarantees opaque. We introduce a relativistic triangle-curvature computing framework for unsupervised embeddings of full-length HIV-1 proteins under federated training. The method combines three linear-algebraic components: (i) radii attenuation, a controlled contraction [Formula: see text] that lowers [Formula: see text]-sensitivity and provides an explicit information-retained ledger; (ii) triangle-curvature decoding, which estimates a batch-level scalar K from the (squared) Menger curvature of random latent triples and rescales [Formula: see text] to preserve inter-cluster geometry in curved regions; and (iii) align-then-average aggregation via orthogonal Procrustes on a small public reference set, followed by distillation of a central encoder on the aligned latent mean so that no private sequences are shared. Applied to 173,750 Los Alamos National Laboratory HIV-1 amino-acid sequences spanning nine proteins (Env, Gag, Pol, Nef, Rev, Tat, Vif, Vpr, Vpu), our curvature-aware model achieves the strongest global separation (silhouette 0.826) with low reconstruction error, while a simple radii schedule attains the tightest clusters (Davies-Bouldin 0.373, Calinski-Harabasz [Formula: see text]). Eight proteins form near-perfect clusters; only the short accessory pair Tat/Vpr exhibits recurring overlap, which we flag for targeted downstream classifiers. Communication overhead is minimal because only public-set latents and one scalar K per batch are shared, making the approach suitable for privacy-preserving, federated sequence surveillance.