Thesis 1: Couple SolrSherlock to a Topic Map
Solr can maintain a topic map; UIMA can access that map.


In my Knowledge Gardening work, there is a Knowledge Federation server which provides a group memory for activities in the garden. Solr is the database, index, and topic map platform for that federation server.

A primary activity inside a topic map system is that of maintaining the appearance of one location for all that is knowable about a given topic. That is, in the same sense that a given city in a given county in a given state will be located at just one set of coordinates in a map of that territory, any individual topic will be represented with just one proxy in a topic map, regardless of how many other topics support that representation.  To maintain that feature, a merge engine is required. 

The work of that merge engine can be as simple as noticing that a person with different names, e.g. "Joe Smith" and "JSmith" each share the same email address. But, it can be as complex as noticing that two people entered answers to a question in different ways, but were each saying the same thing. Merge decisions that complex can easily mean that the merge engine needs the capabilities of a Watson-like platform.


This thesis is based on the premise that a well-groomed topic map can facilitate or augment many of the tasks known to occur during Watson's activities. One way to think about this thesis is to imagine that we propose to let the topic map read text being harvested.

What does it mean that a topic map would read text? Consider the processes an NLP system must engage, two of which are:
  • Identify named entities, which includes people, places, events, dates, and so forth
  • Identify verbs and verb phrases
It's reasonable to imagine that a good topic map will already recognize named entities and some verbs and verb phrases. What it doesn't know, it can learn. Thus, in this thesis, we couple a UIMA-based NLP platform with a topic map, and ask the two to work together to harvest text into the topic map for later question answering and other applications of the system.

Details of this approach will be developed in responses to this node.
OpenSherlock Project »OpenSherlock Project
Architecture Ideas »Architecture Ideas
Thesis 1: Couple SolrSherlock to a Topic Map
Basis for this thesis? »Basis for this thesis?
Thesis 1 Perspective »Thesis 1 Perspective
Provide TopicMapper as a UIMA annotator. »Provide TopicMapper as a UIMA annotator.
SolrKnowledgeFederationServer »SolrKnowledgeFederationServer
Topic Maps »Topic Maps
ClearTK »ClearTK
Leveraging Knowledge Bases in Web Text Processing »Leveraging Knowledge Bases in Web Text Processing
+Comments (0)
+Citations (0)