Abstract
SUMMARY: Performing highly parallelized preprocessing of methylation array data using Python can accelerate data preparation for downstream methylation analyses, including large scale production-ready machine learning pipelines. We present a highly reproducible, scalable pipeline (PyMethylProcess) that can be quickly set-up and deployed through Docker and PIP. AVAILABILITY AND IMPLEMENTATION: Project Home Page: https://github.com/Christensen-Lab-Dartmouth/PyMethylProcess. Available on PyPI (pymethylprocess), Docker (joshualevy44/pymethylprocess). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.