Abstract
The amino acid composition of proteins depends on many factors. It varies in organisms that are distant in taxonomic position. The amino acid composition of proteins depends on the localization of proteins in cells and tissues and the structure of proteins. The question arises: is it possible to separate different proteomes using only the amino acid composition of proteins? Is it possible to determine, considering only its amino acid composition, to what structural class the protein under study will belong? We have developed a method and a measure that maximally separate two sets of proteins. As a result, we assign each protein an R-value, positive values of which are more characteristic of the first set, and negative ones-of the second. By studying the distribution of R in two sets, we can determine how much these sets differ in composition. Also, when examining a new protein, we can determine if it is more similar to the first set or the second. In this paper, we show that using only amino acid composition, it is possible to separate sets of proteins belonging to different organisms, as well as proteins that differ in function or structure. In all cases, we assign to proteins a measure R that maximally separates the studied sets. This approach can be further used to annotate proteins with unknown functions.