Abstract
Recently developed quantitative structure-activity relationship (QSAR) prediction uses machine learning techniques with analytical signals from the full scan of mass spectra as input, and does not need exhaustive structural determination to assess unknown compounds. The QSAR approach assumes that a mass spectral pattern reflects the structure of a target chemical. However, the relationship between the spectrum and structure is complex, and requirement of its interpretation could restrict further development of QSAR prediction methods based on analytical signals. In this study, whether gas chromatography-electron-impact ionization-mass spectrometry (GC-EI-MS) data contain meaningful structural information that assists QSAR prediction was determined by comparing it with the traditional molecular descriptor used in QSAR prediction. Four molecular descriptors were used: ECFP6, topological descriptor in CDK, MACCS key, and PubChem fingerprint. The predictive performance of QSAR based on analytical and molecular descriptors was evaluated in terms of molecular weight, log K(o-w), boiling point, melting point, water solubility, and two oral toxicities in rats and mice. The influential variables were further investigated by comparing analytical-descriptor-based and linear regression models using simple indicators of the mass spectrum. The investigation indicated that the analytical and molecular descriptors preserved structural information differently. However, their performance was comparable. The analytical-descriptor-based approach predicted the physicochemical properties and toxicities of structurally unknown chemicals, which was beyond the scope of the molecular-descriptor-based approach. The QSAR approach based on analytical signals is valuable for evaluating unknown chemicals in many scenarios.