Abstract
EncoderMap is a dimensionality reduction method that is tailored for the analysis of molecular simulation data. It relies on a neural network autoencoder architecture augmented with an additional multidimensional scaling (MDS)-like loss term. Due to this additional cost function between the high-dimensional input and the latent space, EncoderMap emerges as a method that has advantages over other dimensionality reduction methods and straightforward autoencoders alike. In particular, the low-dimensional projections created by EncoderMap are guided by the MDS-like cost function adopted from the sketch-map algorithm, which produces projections with a better correlation between high- and low-dimensional similarities and dissimilarities. EncoderMap has been successfully applied to a wide range of molecular dynamics data. Here, we present a new version of the EncoderMap package. Porting EncoderMap to version 2 of the TensorFlow library not only ensures that it can be used on modern computers but most importantly allowed for the introduction of new features and a wide range of customization options (e.g., user-defined custom loss functions). We introduce new visualization capabilities and encoding/decoding features and make EncoderMap modular so that researchers can better understand the training process. Most of these additions can be used on general high-dimensional data, while some specifically aid in working with biomolecular simulation data. Furthermore, we added the possibility to provide sparse inputs to EncoderMap's autoencoder, which can be used to train the same network on topologically different proteins. We demonstrate these new features with the help of three different molecular dynamics data sets in the context of the ubiquitin system.