Automated Information Extraction from Pathology Reports

 

People: Naveen Ashish (Calit2), Charles Boicey and Lisa Dahm (UCI Medical Center)

 

Introduction: The goal of this project is to develop a system for the automated extraction and synthesis of information in semi-structured and unstructured medical and clinical reports such as pathology reports. This project is part of the overall UCI Medical Center QUEST data warehousing project. The resulting system will populate the data warehouse with structured information extracted and synthesized from the text in the reports.

 

We are leveraging some key open-source technologies for this task. We are using the UIMA framework as the environment for developing our extraction system, and also the OHNLP MedKAT/P system which is a specific UIMA pipeline for the medical domain. At a later stage we will also investigate the use of the XAR open-source information system extraction system that provides capabilities for handling uncertainty in the (extracted) data.

 

Currently we are developing UIMA Analysis Engines and an extraction pipeline motivated by our set of medical data, but applicable more generally.