Details view: Annotating text in HTML with UIMA and Jericho

comments

Respond
Edit
- Edit article
- Delete article
Share
View
- Graph
  - Explorer
    
    Focus
    Down
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    Load 4 levels
    Load all levels
    
    All
  - Dagre
    
    Focus
    Down
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    Load 4 level
    Load all levels
    
    All
- Tree
  - SpaceTree
    
    Focus
    Expanding
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    
    Down
    All
    Down
  - Radial
    
    Focus
    Expanding
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    
    Down
    All
    Down
  - Box
    
    Focus
    Expanding
    Down
    Up
    All
    Down
- Article ✓
- Outline
- Document
  - Down
  - All
- Page
- Canvas
- Time
  - Timeline
  - Calendar
Updates
Contact us

Annotating text in HTML with UIMA and Jericho

http://sujitpal.blogspot.com/2011/04/annotating-text-in-html-with-uima-and.html

Some time back, I wrote about an UIMA Sentence Annotator component that identified and annotated sentences in a chunk of text. This works well for plain text input, but in the application I am planning to build, I need to be able to annotate HTML and plain text.

The annotator that I ended up building is a two pass annotator. In the first pass, it iterates through the document text by node, applies the include and skip tag and attribute rules. In the second pass, it iterates through the (pre-processed) document text line by line, filtering by density as described here. The annotator annotates the text with the original character positions of the text blocks in the document.

Annotating text in HTML with UIMA and Jericho

Enter task details