Relativistic triangle-curvature computing for federated HIV-1 protein-sequence monitoring

用于联合 HIV-1 蛋白序列监测的相对论三角曲率计算

阅读:1

Abstract

Sequence-only surveillance of rapidly evolving pathogens must extract clinically meaningful structure from protein sequences without labels, central data pooling, or strong assumptions about data homogeneity. Most existing sequence autoencoders either assume centralized, IID data or rely on heavy cryptographic protocols; in federated deployments they can leak geometric information through latents or gradients, suffer from client-specific rotations and sign flips of the latent basis, and ignore curvature of the latent manifold, which together degrade clustering quality and make privacy guarantees opaque. We introduce a relativistic triangle-curvature computing framework for unsupervised embeddings of full-length HIV-1 proteins under federated training. The method combines three linear-algebraic components: (i) radii attenuation, a controlled contraction [Formula: see text] that lowers [Formula: see text]-sensitivity and provides an explicit information-retained ledger; (ii) triangle-curvature decoding, which estimates a batch-level scalar K from the (squared) Menger curvature of random latent triples and rescales [Formula: see text] to preserve inter-cluster geometry in curved regions; and (iii) align-then-average aggregation via orthogonal Procrustes on a small public reference set, followed by distillation of a central encoder on the aligned latent mean so that no private sequences are shared. Applied to 173,750 Los Alamos National Laboratory HIV-1 amino-acid sequences spanning nine proteins (Env, Gag, Pol, Nef, Rev, Tat, Vif, Vpr, Vpu), our curvature-aware model achieves the strongest global separation (silhouette 0.826) with low reconstruction error, while a simple radii schedule attains the tightest clusters (Davies-Bouldin 0.373, Calinski-Harabasz [Formula: see text]). Eight proteins form near-perfect clusters; only the short accessory pair Tat/Vpr exhibits recurring overlap, which we flag for targeted downstream classifiers. Communication overhead is minimal because only public-set latents and one scalar K per batch are shared, making the approach suitable for privacy-preserving, federated sequence surveillance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。