Apache cTakes
Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.

Apache clinical Text Analysis and Knowledge Extraction System(cTAKES) is an open-source natural language processing system forinformation extraction from electronic medical record clinicalfree-text. It processes clinical notes, identifying types of clinicalnamed entities from various dictionaries including the Unified MedicalLanguage System (UMLS)- medications, diseases/disorders, signs/symptoms, anatomical sites andprocedures. Each named entity has attributes for the text span, theontology mapping code, subject (patient, family member, etc.) andcontext (negated/not negated, conditional, generic, degree ofcertainty). Some of the attributes are expressed as relations, forexample the location of a clinical condition (locationOf relation) orthe severity of a clinical condition (degreeOf relation).

Apache cTAKES was built using the Apache UIMA UnstructuredInformation Management Architecture engineering framework and ApacheOpenNLP natural language processing toolkit. Its components arespecifically trained for the clinical domain out of diverse manuallyannotated datasets, and create rich linguistic and semantic annotationsthat can be utilized by clinical decision support systems and clinicalresearch. cTAKES has been used in a variety of use cases in the domainof biomedicine such as phenotype discovery, translational science,pharmacogenomics and pharmacogenetics.

Apache cTAKES employs a number of rule-based and machine learning methods. Apache cTAKES components include:

  1. Sentence boundary detection
  2. Tokenization (rule-based)
  3. Morphologic normalization
  4. POS tagging
  5. Shallow parsing
  6. Named Entity Recognition
    • Dictionary mapping
    • Semantic typing is based on these UMLS semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications
  7. Assertion module
  8. Dependency parser
  9. Constituency parser
  10. Semantic Role Labeler
  11. Coreference resolver
  12. Relation extractor
  13. Drug Profile module
  14. Smoking status classifier

The goal of cTAKES is to be a world-class natural language processingsystem in the healthcare domain. cTAKES can be used in a great varietyof retrievals and use cases. It is intended to be modular and expandableat the information model and method level.The cTAKES community is committed to best practices and R&D(research and development) by using cutting edge technologies and novelresearch. The idea is to quickly translate the best performing methodsinto cTAKES code.

OpenSherlock Project »OpenSherlock Project
Resources »Resources
Infrastructure »Infrastructure
Apache cTakes
+Comments (0)
+Citations (0)