ClearTK
ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA
http://code.google.com/p/cleartk/ ClearTK
ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. It is developed by the Center for Computational Language and Education Research (CLEAR) at the University of Colorado at Boulder. Please see the conceptual overview for a broad introduction to ClearTK.
Features
- A common interface and wrappers for popular machine learning libraries such as SVMlight, LIBSVM, OpenNLP MaxEnt, and Mallet.
- A rich feature extraction library that can be used with any of the machine learning classifiers. Under the covers, ClearTK understands each of the native machine learning libraries and translates your features into a format appropriate to whatever model you're using.
- Infrastructure for creating NLP components for specific tasks such as part-of-speech tagging, BIO-style chunking, named entity recognition, semantic role labeling, temporal relation tagging, etc.
- Wrappers for common NLP tools such as the Snowball stemmer, the OpenNLP tools, the MaltParser dependency parser, and the Stanford CoreNLP tools.
- Corpus readers for collections like the Penn Treebank, ACE 2005, CoNLL 2003, Genia, TimeBank and TempEval.
User Documentation
The following resources are intended for people who want to use ClearTK in their own projects:
- User Setup - step-by-step instructions for downloading and adding ClearTK to your classpath
- Tutorial (POS tagger) - a tutorial on building an analysis engine that uses a classifier for part-of-speech tagging
- Tutorial (BIO Chunking) - a tutorial on building an analysis engine that uses a BIO chunking classifier for named entity recognition
- Modules - a list and short descriptions of the available ClearTK modules
- User FAQ - a list of frequently asked questions for users of ClearTK
- ClearTK 1.2.0 API - API documentation (Javadoc) for the latest release
If you have more questions about using ClearTK, please post them to cleartk-users@googlegroups.com.
Developer Documentation
The following resources are intended for people, especially committers, who are working with ClearTK directly from the repository.
- Developer Setup - step-by-step instructions for building and running ClearTK from the repository
- Developer FAQ - a list of frequently asked questions for working with the ClearTK repository, including how to make ClearTK releases
- Mailing Lists - a description of how each of the ClearTK developer mailing lists are used