The JULIE PUBMED Reader (a UIMA Collection Reader) reads PUBMED (the major bibliographic database for the biomedical domain) abstracts in XML format. These abstracts contain meta-information such as the title, the authors, publication information, and additionally -- in case of the manually curated abstracts -- a list of keywords, MeSH headings, and chemicals.
This information is stored in the type system (see our UIMA type system), the abstract text is set as the document text for further NLP text processing.