Abstract
OBJECTIVE: To describe a system for determining the assertion status of medical problems mentioned in clinical reports, which was entered in the 2010 i2b2/VA community evaluation 'Challenges in natural language processing for clinical data' for the task of classifying assertions associated with problem concepts extracted from patient records. MATERIALS AND METHODS: A combination of machine learning (conditional random field and maximum entropy) and rule-based (pattern matching) techniques was used to detect negation, speculation, and hypothetical and conditional information, as well as information associated with persons other than the patient. RESULTS: The best submission obtained an overall micro-averaged F-score of 0.9343. CONCLUSIONS: Using semantic attributes of concepts and information about document structure as features for statistical classification of assertions is a good way to leverage rule-based and statistical techniques. In this task, the choice of features may be more important than the choice of classifier algorithm.