Abstract
Ocean archaea, key members of the ocean ecosystem, thrive under extreme conditions. However, related study remains challenging due to difficulties in isolating and culturing these species. This study explores whether the predicted proteomes of ocean archaea exhibit distinct properties compared to other model organisms. Using AlphaFold2-predicted protein structures, we analyze features including protein length, salt bridge, hydrogen bond, relative surface area, and secondary structure composition. Our results reveal that ocean archaea proteins are generally smaller in size and more stable in structural composition than eukaryote organisms. By vectorizing proteins using deep learning techniques and analyzing their relationship with UniProt annotations, we visualize the connections between protein sequences, structures, families, and functions. We also cluster similar proteins and calculate the lowest common ancestor of proteins in each cluster to reflect the relation between proteins' evolutionary level and conformation changes. Furthermore, after comparing the structural domains of ocean archaea proteomes with existing databases and conducting the hierarchical clustering, we identify 255 protein domain clusters not defined by any domain database. Our proteome-wide analysis uncovers previously unrecognized characteristics and domain folds in ocean archaea proteins, which may be related to adaptive strategies in the extreme marine environment. We also build a website at: http://www.csbio.sjtu.edu.cn/bioinf/Ocean_archaea/ for academic use.