Details view: Open Health Natural Language Processing

+Citaten (4)

- CitatenVoeg citaat toeList by: CiterankMap

Link[1] Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model

Citerend uit: Anni Codena,Guergana Savovab,Igor Sominskya,Michael Tanenblatta, James Masanzb, Karin Schulerb, - other authors: James Coopera, Wei Guand,Piet C. de Groen
Publication info: 2009 Journal of Biomedical Informatics Volume 42, Issue 5, October 2009,
Geciteerd door: Jack Park 1:55 AM 6 February 2013 GMT
URL:

http://www.sciencedirect.com/science/article/pii/S1532046408001585

Fragment-

We introduce an extensible and modifiable knowledge representation model to represent cancer disease characteristics in a comparable and consistent fashion. We describe a system, MedTAS/P which automatically instantiates the knowledge representation model from free-text pathology reports. MedTAS/P is based on an open-source framework and its components use natural language processing principles, machine learning and rules to discover and populate elements of the model. To validate the model and measure the accuracy of MedTAS/P, we developed a gold-standard corpus of manually annotated colon cancer pathology reports. MedTAS/P achieves F1-scores of 0.97–1.0 for instantiating classes in the knowledge representation model such as histologies or anatomical sites, and F1-scores of 0.82–0.93 for primary tumors or lymph nodes, which require the extractions of relations. An F1-score of 0.65 is reported for metastatic tumors, a lower score predominantly due to a very small number of instances in the training and test sets.

Link[2] Word sense disambiguation across two domains: Biomedical literature and clinical notes

Citerend uit: Guergana K. Savovaa,Anni R. Codenb, Igor L. Sominskyb, Rie Johnsonc,Philip V. Ogrena, - other authors: Piet C. de Groena, Christopher G. Chutea
Publication info: 2008 Journal of Biomedical Informatics Volume 41, Issue 6, December 2008
Geciteerd door: Jack Park 2:03 AM 6 February 2013 GMT
URL:

http://www.sciencedirect.com/science/article/pii/S1532046408000245

Fragment-

The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains—biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

Link[3] Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

Citerend uit: Guergana K Savova,James J Masanz, Philip V Ogren, Jiaping Zheng - other authors: Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute
Publication info: 2010
Geciteerd door: Jack Park 2:07 AM 6 February 2013 GMT
URL:

http://jamia.bmj.com/content/17/5/507.full.pdf+html

Fragment-

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologiesdthe Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy¼0.949; tokenizer accuracy¼0.949; part-ofspeech tagger accuracy¼0.936; shallow parser Fscore ¼0.924; named entity recognizer and system-level evaluation F-score¼0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

Link[4] Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease

Citerend uit: Iftikhar J Kullo, Jin Fan, Jyotishman Pathak, Guergana K Savova, - other authors: Zeenat Ali, Christopher G Chute
Publication info: 2010
Geciteerd door: Jack Park 2:12 AM 6 February 2013 GMT
URL:

http://jamia.bmj.com/content/17/5/568.full.pdf+html

Fragment-

Background
There is significant interest in leveraging the electronic medical record (EMR) to conduct genomewide association studies (GWAS).
Methods
A biorepository of DNA and plasma was created by recruiting patients referred for non-invasive lower extremity arterial evaluation or stress ECG. Peripheral arterial disease (PAD) was defined as a resting/post-exercise ankle-brachial index (ABI) less than or equal to 0.9, a history of lower extremity revascularization, or having poorly compressible leg arteries. Controls were patients without evidence of PAD. Demographic data and laboratory values were extracted from the EMR. Medication use and smoking status were established by natural language processing of clinical notes. Other risk factors and comorbidities were ascertained based on ICD-9-CM codes, medication use and laboratory data.
Results
Of 1802 patients with an abnormal ABI, 115 had non-atherosclerotic vascular disease such as vasculitis, Buerger’s disease, trauma and embolism (phenocopies) based on ICD-9-CM diagnosis codes and were excluded. The PAD cases (66611 years, 64% men) were older than controls (6168 years, 60% men) but had similar geographical distribution and ethnic composition. Among PAD cases, 1444 (85.6%) had an abnormal ABI, 233 (13.8%) had poorly compressible arteries and 10 (0.6%) had a history of lower extremity revascularization. In a random sample of 95 cases and 100 controls, risk factors and comorbidities ascertained from EMR-based algorithms had good concordance compared with manual record review; the precision ranged from 67% to 100% and recall from 84% to 100%.
Conclusion
This study demonstrates use of the EMR to ascertain phenocopies, phenotype heterogeneity and relevant covariates to enable a GWAS of PAD. Biorepositories linked to EMR may provide a relatively efficient means of conducting GWAS.

Enter task details