Joel Martin
Phone: 613-993-0266
Fax: 613-990-3908
Email: Joel.Martin@nrc-cnrc.gc.ca
Randall Milburn
Phone: 613-993-0221
Fax: 613-952-0215
Email: Randall.Milburn@nrc-cnrc.gc.ca
Information found on this page has been archived and is for reference, research or recordkeeping purposes. Please visit
NRC's new site
for the most recent information.
Information identified as archived on the Web is for reference, research or recordkeeping purposes. It has not been altered or updated after the date of archiving. Web pages that are archived on the Web are not subject to the Government of Canada Web Standards. As per the Communications Policy of the Government of Canada, you can request alternate formats by contacting us.
Medical records contain text and will always do so because text is expressive, succinct and fast to create or read. These characteristics are appreciated by doctors but less so by researchers who want to perform large studies across many medical records. Many of the records’ essential medical facts are encoded in text that computers cannot yet understand.
In 2008, NRC conducted research in knowledge discoveries through a project called Text Mining of Medical Records (TMMR). The project aims to establish text mining tools that are flexible and adaptable so that they can be applied by the end-user for processing electronic health records and other medical text in situations where the scope may change, or other forms of uncertainty occur. The project TMMR leverages the Interactive Information Group’s expertise in language processing and machine learning techniques, and builds upon a long experience with biomedical text. The text mining tool development is motivated by improving health and wellness, by increasing the efficiency and effectiveness of medical researchers and other health professionals.
For processing the text from medical records, this project addresses two principal scenarios: Producing alerts and Extracting medical facts.
There are many cases where health professionals require alerts of unusual health events based on aggregating data from multiple records. For instance, the texts from emergency room records are collected in real time by a central computer, which monitors them to track the reasons for first aid visits. When the frequency of any of those events exceeds an expected level, a meaningful alert is issued for a suspected medical outbreak. This allows the receiving public health officials to respond immediately. An adequate response is likely to require additional information from a variety of data sources, many of which are, again, expected to be in text format.
The scenario involves scanning journal articles that describe clinical studies, particularly randomized controlled trials (RCT). These RCT studies are very formulaic but it is still difficult to extract facts from them. Extracting such facts is necessary to perform meta-analyses and to perform ethical checks on those performing the studies. An Automatic Information Extraction technique developed in the TMMR project showed very good results, and when combined with the tightly integrated user interface, allowed an expert to quickly enter verified data into a structured knowledge base.
Techniques under this scenario have also been applied to patient record text, where it is often required to combine an extracted medical fact (e.g., hypertension status) with the surrounding context (e.g., ‘suspected’, or ‘family history’).
NRC is currently progressing well in building automation TMMR technologies and to transfer to Canadian companies. This technology transfer will help Canadian companies to market new product based NRC technology. The project is expected to end in 2011.