Abstract
The Immune Epitope Database (IEDB, iedb.org) has manually curated epitope data from over 26,000 publications across two decades. With PubMed adding ~5,000 articles daily, traditional curation methods face scalability challenges. Given the multimodality of data contained in scientific papers, we have sought to build an open-source vision language model (VLM)-based tool that human curators can use to speed up and automate biological data curation. Here we present a multimodal document ingestion and Question-Answering (QnA) pipeline that ties traditional Optical Character Recognition (OCR) and text matching with Vision-Language Model (VLM) capabilities. The system, which we call EPITOME, implements three-stage processing: regex-based epitope and MHC molecule identification, visual element extraction from PDFs, and contextual indexing that links peptide sequences, MHC molecules, and assays to their locations across text, tables, and figures. This indexing is used to supply context for further VLM QnA. Our preliminary results from EPITOME demonstrate promising zero-shot performance of open-source VLMs that suggest promise for accelerating biocuration through a curator-in-the-loop process, with our evaluation identifying strategic points where curator-in-the-loop intervention can enhance overall system accuracy.