|
Automated Information Extraction from
Pathology Reports |
|
People: Naveen
Ashish (Calit2), Charles Boicey and Lisa Dahm (UCI Medical Center) |

|
Introduction: The goal of this project is to
develop a system for the automated extraction and synthesis of information in
semi-structured and unstructured medical and clinical
reports such as pathology reports. This project is part of the overall UCI
Medical Center QUEST data warehousing project. The resulting system will
populate the data warehouse with structured
information extracted and synthesized from the text in the reports. We
are leveraging some key open-source technologies for this task. We are using
the UIMA framework as the environment
for developing our extraction system, and also the OHNLP
MedKAT/P system which is a specific UIMA pipeline for the medical domain.
At a later stage we will also investigate the use of the XAR open-source
information system extraction system that provides capabilities for handling
uncertainty in the (extracted) data. Currently
we are developing UIMA Analysis Engines
and an extraction pipeline motivated by our set of medical data, but
applicable more generally. |