Let us first sketch how the merge platform operates. We implement an instance of a virtual merge platform capable of detecting the need to merge two topics under a variety of scenarios, and performing the processes associated with finding an existing virtual proxy or creating one, and creating the merge associations. The process is a bit more complex: any reference in the topic map which refers to the new proxy just merged must be surgically modified to link now to the virtual proxy which stands for the merged proxy.
At the same time, the virtual proxy becomes a set-union combination of all of the attributes of the merged proxies; it represents a full view of that topic. A set-union combination is simply a combining of all of the attributes and their values, without duplication.
MergeBean Algorithm
The primary method signature is:
IResult assertMerge(INode sourceNode,
String targetNodeLocator, Map<String, Double> mergeData,
double mergeConfidence, String userLocator)
The inputs are:
· A sourceNode, which is a new node just sent to Solr and indexed
· A targetNodeLocator, which is the identity of a candidate node to which the sourceNode should be merged; the decision to make that merge has already been made by this time
· A map of mergeData, which represents the reasons for making the merge
· A mergeConfidence value, which is not presently used
· A userLocator, which is the identity of the agent suggesting this merge
The process, as suggested above, is to cause to exist a Virtual Node which represents the same topic as both of the nodes in this merge. In fact, that virtual node might already exist, so part of the process is to see if that is the case. A pathological case is one in which targetNodeLocator identifies the virtual node in this merge. Once a virtual node is established, then create the necessary MergeProxies, and wire them into the graph. When a virtual node is created, we then explore the entire topic map, rewiring any reference to a merged node to, instead, link to the new virtual node.
Along the way, there will be surgical changes performed on the topic map. Coding conventions for what Solr calls Atomic Updates[2] are well documented, and crucial to the behaviors and performance of the merge operation.
Pseudocode for the main routines follows:
assertMerge (sourceNode, targetNodeLocator,mergeData...)
Set virtualNode = null
Set targetNode = fetch a virtual node based on targetNodeLocator if it exists.
If targetNode is a virtualNode
set virtualNode = targetNode // we now have a virtual node
If virtualNode == null // we don't have a virtual node for this merge
virtualNode = create a new node
set virtualNode type and superClasses to that of sourceNode
setUnion properties from sourceNode into virtualNode
setUnion properties from targetNode into virtualNode
save the virtual node to the database
//we made the virtualNode, now wire the MergeProxy objects
wireMerge with virtualNode and sourceNode
wireMerge with virtualNode and targetNode
//the merge has been made on both nodes; now rewire the topic map
reWireNodeGraph with targetNode and virtualNode
Else // virtual node exists
surgically setUnion targetNode into virtualNode
wireMerge (virtualNode, mergedNode,mergeData,...)
Set mergeNode = create a new Tuple with the MergeAssertionType
Set mergeNode's subject = virtualNode
Set mergeNode's object = mergedNode
Set mergeNode's reasons = mergeData
save mergeNode to the database
reWireNodeGraph (targetNode, virtualNode)
for each tuple in the topic map for which targetNode is the subject
surgically substitute reference to virtualNode for targetNode
for each tuple in the topic map for which targetNode is the object
surgically substitute reference to virtualNode for targetNode
Note the assumption in reWireNodeGraph: all links to merged nodes will occur only in tuples (relations). That may not turn out to be the case, but is thought to be the case in the present platform. That design assumption may have to be revisited eventually.
References
Bleier, Arnim, Patrick Jahnichen, Uta Schulze, Lutz Maicher (2010). " The Praxis of Social Knowledge Federation". In Dino Karabeg and Jack Park (eds). Knowledge Federation 2010 Self-Organizing Collective Mind, Second International Workshop on Knowledge Federation Dubrovnik, Croatia, October 3-6, 2010. Paper online at http://ceur-ws.org/Vol-822/AB.pdf
Garshol, Lars Marius, and Graham Moore (Editors) (2008). “Topic Maps—Data Model.” Online at http://www.isotopicmaps.org/sam/sam-model/
Kivelä, Aki (2010). “Introduction to Layered Topic Maps”. Online documentation for the open source Wandora topic map platform. Online at http://www.wandora.org/wandora/wiki/index.php?title=Introduction_to_Layered_Topic_Maps
Park, Jack (2010a). " Boundary Infrastructures for IBIS Federation: Design Rationale, Implementation, and Evaluation". Thesis Proposal. Paper online at http://kmi.open.ac.uk/publications/techreport/kmi-10-01
Park, Jack (2010b). "Topic merge scenarios for knowledge federation". In: Maicher, Lutz and Garshol, Lars Marius eds. Information Wants to be a Topic Map: Revised Selected Papers. Leipzig: Universit¨at Leipzig, pp. 143–154. Paper online at http://oro.open.ac.uk/23944/1/Park-TMRA2010.pdf
Schulz, Uta (2010). "Hatana - Virtual Topic Map Merging". TMRA 2010. Slides online at http://www.slideshare.net/tmra/hatana-virtual-topic-map-merging