Abstract
The goal of this work was to establish a reasonable baseline for research in patient cohort retrieval from clinical free text. Much recent work has used Lucene for this purpose. Our approach was to use MetaMap alone. We found that although many TREC 2011 Electronic Medical Records track participants found it difficult to beat a Lucene baseline, our MetaMap-based baseline did outperform a number of Lucene runs. We propose that MetaMap is a more valid baseline than Lucene, providing essential concept extraction, and that failure to make use of this industry-standard tool results in an unfairly low baseline for evaluation of system outputs.