Abstract
OBJECTIVES: Case reports and case series comprise a significant portion of the biomedical literature, yet unlike case reports, the National Library of Medicine does not index case series as a Publication Type. This hurts clinicians' and researchers' ability to retrieve, identify and analyze evidence from this type of study. MATERIALS AND METHODS: PubMed articles mentioning "case series" in title or abstract were characterized to learn what are considered to be case series by the authors themselves. We then set aside articles better indexed as other standard publication types - case reports, cohort studies, reviews and clinical trials -- as well as those that discuss (rather than report the results of) case series studies, to create a corpus of typical case series articles. A random sample of these articles was evaluated by two annotators who confirmed that the great majority satisfy a formal definition of "case series". RESULTS: The corpus was utilized in an automated transformer-based machine learning indexing model. Case series performance of this model on hold-out data was excellent (precision = 0.887, recall = 0.952, F1 = 0.918, PR-AUC = 0.941) and manual evaluation of 100 articles tagged as "case series" revealed that 88% satisfied a formal definition of case series. DISCUSSION AND CONCLUSION: This study demonstrates the feasibility of automatically indexing case series articles. Indexing should enhance their discoverability, and hence their medical value, for evidence synthesis groups as well as general users of the biomedical literature.