Wikipedia Collection Reader
The JULIE Lab Wikipedia Collection Reader reads in Wikipedia articles from a database, parses the raw wikitext, composes a cleansed document text and retains the original document structure in terms of UIMA annotations (appropriate annotation types are defined in our UIMA type system, version 2.6.8 or higher). The reader uses the Java Wikipedia Library (JWPL) Parser developed by the Ubiquitous Knowledge Processing Lab (TU Darmstadt) for parsing wikitext.