Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

Peptide retention time (RT) prediction algorithms are tools to study and identify the physicochemical properties that drive the peptide-sorbent interaction. Traditional RT algorithms use multiple linear regression with manually curated parameters to determine the degree of direct contribution for each parameter and improvements to RT prediction accuracies relied on superior feature engineering. Deep learning led to a significant increase in RT prediction accuracy and automated feature engineering via chaining multiple learning modules. However, the significance and the identity of these extracted variables are not well understood due to the inherent complexity when interpreting "relationships-of-relationships" found in deep learning variables. To achieve both accuracy and interpretability simultaneously, we isolated individual modules used in deep learning and the isolated modules are the shallow learners employed for RT prediction in this work. Using a shallow convolutional neural network (CNN) and gated recurrent unit (GRU), we find that the spatial features obtained via the CNN correlate with real-world physicochemical properties namely cross-collisional sections (CCS) and variations of assessable surface area (ASA). Furthermore, we determined that the discovered parameters are "micro-coefficients" that contribute to the "macro-coefficient" - hydrophobicity. Manually embedding CCS and the variations of ASA to the GRU model yielded an R2Â =Â 0.981 using only 525 variables and can represent 88% of the â¼110,000 tryptic peptides used in our dataset. This work highlights the feature discovery process of our shallow learners can achieve beyond traditional RT models in performance and have better interpretability when compared with the deep learning RT algorithms found in the literature.

期刊：	Computational and Structural Biotechnology Journal	影响因子：	4.100
时间：	2023	起止号：	2023 Feb 27; 21:2446-2453
doi：	10.1016/j.csbj.2023.02.047

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

探索用于反相保留时间预测的浅层机器学习模型的变量空间

特别声明