Thesis 1: Couple SolrSherlock to a Topic Map
Solr can maintain a topic map; UIMA can access that map.
Background
In my Knowledge Gardening work, there is a Knowledge Federation server which provides a group memory for activities in the garden. Solr is the database, index, and topic map platform for that federation server.
A primary activity inside a topic map system is that of maintaining the appearance of
one location for all that is knowable about a given topic. That is, in the same sense that a given city in a given county in a given state will be located at just one set of coordinates in a map of that territory, any individual topic will be represented with just one
proxy in a topic map, regardless of how many other topics support that representation. To maintain that feature, a
merge engine is required.
The work of that merge engine can be as simple as noticing that a person with different names, e.g. "Joe Smith" and "JSmith" each share the same email address. But, it can be as complex as noticing that two people entered answers to a question in different ways, but were each saying the same thing. Merge decisions that complex can easily mean that the merge engine needs the capabilities of a Watson-like platform.
Approach
This thesis is based on the premise that a well-groomed topic map can facilitate or augment many of the tasks known to occur during Watson's activities. One way to think about this thesis is to imagine that we propose to
let the topic map read text being harvested.
What does it mean that a topic map would
read text? Consider the processes an NLP system must engage, two of which are:
- Identify named entities, which includes people, places, events, dates, and so forth
- Identify verbs and verb phrases
It's reasonable to imagine that a good topic map will already recognize named entities and some verbs and verb phrases. What it doesn't know, it can
learn. Thus, in this thesis, we couple a UIMA-based NLP platform with a topic map, and ask the two to work together to harvest text into the topic map for later question answering and other applications of the system.
Details of this approach will be developed in
responses to this node.