Virtual Merge Platform
Explanation of the virtual merge process

Topic Map Merge Platform

Author: Jack Park

Latest edit: 20130315

What

We describe a body of code, a platform, which performs the processes of pattern recognition and topic merge in the context of a topic map[1]. Pattern recognition techniques are used to compare the identity of a new information resource introduced into the topic map against the identities of topics which exist in the topic map.  When a new topic is determined to be about a topic already found in the topic map, then the two representations must be merged such that, to a user seeking a view of a topic, all representations appear in that view as if they are one representation.

When two topics are found to be of the same identity, that is, about the same subject, they are merged. (Garshol & Moore, 2008, §6) say that merging is “a process applied to a topic map in order to eliminate redundant topic map constructs in that topic map”. Like any map, the objective is to co-locate all resources about and associated with a given subject. (Garshol & Moore, 2008) then proceed to outline steps typically taken to perform merge operations.

In recent literature, (Park, 2010a&b; Kivelä, 2010; Blier, et al., 2010; Schulze, 2010), a different form of merge operation is described. To contrast previous and emerging techniques, consider that the previous (c.f. (Garshol & Moore, 2008)) system performed the equivalent of a set-union aggregation of two resources into one. One topic serves as a base, and anything that is different in the other topic is copied into the base; the other proxy is then discarded. Methodologies described in (Park, 2010b) and (Kivelä, 2010), posit a new proxy for the subject, called a virtual proxy (Park, 2010b) which serves as a place holder for the subject itself, while the two proxies to be merged remain intact, linked to the virtual proxy by way of associations (subjects) which describe the reasons for the merge.


The figure above, adapted from (Park, 2010b), illustrates a virtual merge of two proxies. SubjectProxy #101 is linked to VirtualProxy#436 by way of a merge association identified as OriginalProxy. The merge is then performed on SubjectProxy #435 through its MergedProxy association.

Why

A topic map, like a good road map, is a source of well-organized information resources. A central promise is this:

There will be one and only one location in the map where all that is knowable about a given subject is located.

Advantages associated with virtual merging follow two key issues: merges should be contestable, and merges that succumb to a successful contest must be restored to the individual state before merge. In the case of the framework described in the figure above, a merge association, itself a subject in the topic map, can serve as an actor in a relationship which contests its validity. Indeed, it is possible to use the same structured issue-based conversation processes described here on the topic map itself.  If a merge action between an original and another proxy is contested and shown to be invalid, un-merging becomes a matter of recording the reasons for invalidating the association, then marking it invalid; the merge is then not treated as valid by the topic map, but all the facts surrounding the merge remain recorded in the topic map

How

 

Let us first sketch how the merge platform operates. We implement an instance of a virtual merge platform capable of detecting the need to merge two topics under a variety of scenarios, and performing the processes associated with finding an existing virtual proxy or creating one, and creating the merge associations.  The process is a bit more complex: any reference in the topic map which refers to the new proxy just merged must be surgically modified to link now to the virtual proxy which stands for the merged proxy.

At the same time, the virtual proxy becomes a set-union combination of all of the attributes of the merged proxies; it represents a full view of that topic.  A set-union combination is simply a combining of all of the attributes and their values, without duplication.

MergeBean Algorithm

The primary method signature is:

IResult assertMerge(INode sourceNode,

            String targetNodeLocator, Map<String, Double> mergeData,

            double mergeConfidence, String userLocator)

The inputs are:

·         A sourceNode, which is a new node just sent to Solr and indexed

·         A targetNodeLocator, which is the identity of a candidate node to which the sourceNode should be merged; the decision to make that merge has already been made by this time

·         A map of mergeData, which represents the reasons for making the merge

·         A mergeConfidence value, which is not presently used

·         A userLocator, which is the identity of the agent suggesting this merge

The process, as suggested above, is to cause to exist a Virtual Node which represents the same topic as both of the nodes in this merge. In fact, that virtual node might already exist, so part of the process is to see if that is the case.  A pathological case is one in which targetNodeLocator identifies the virtual node  in this merge. Once a virtual node is established, then create the necessary MergeProxies, and wire them into the graph.  When a virtual node is created, we then explore the entire topic map, rewiring any reference to a merged node to, instead, link to the new virtual node.

Along the way, there will be surgical changes performed on the topic map. Coding conventions for what Solr calls Atomic Updates[2] are well documented, and crucial to the behaviors and performance of the merge operation.

Pseudocode for the main routines follows:

assertMerge (sourceNode, targetNodeLocator,mergeData...)

Set virtualNode = null

Set targetNode = fetch a virtual node based on targetNodeLocator if it exists.

If targetNode is a virtualNode

set virtualNode = targetNode // we now have a virtual node

If virtualNode == null // we don't have a virtual node for this merge

virtualNode = create a new node

set virtualNode type and superClasses to that of sourceNode

setUnion properties from sourceNode into virtualNode

setUnion properties from targetNode into virtualNode

save the virtual node to the database

//we made the virtualNode, now wire the MergeProxy objects

wireMerge with virtualNode and sourceNode

wireMerge with virtualNode and targetNode

//the merge has been made on both nodes; now rewire the topic map

reWireNodeGraph with targetNode and virtualNode

Else // virtual node exists

surgically setUnion targetNode into virtualNode

wireMerge (virtualNode, mergedNode,mergeData,...)

Set mergeNode = create a new Tuple with the MergeAssertionType

Set mergeNode's subject = virtualNode

Set mergeNode's object = mergedNode

Set mergeNode's reasons = mergeData

save mergeNode to the database

reWireNodeGraph (targetNode, virtualNode)

for each tuple in the topic map for which targetNode is the subject

surgically substitute reference to virtualNode for targetNode

for each tuple in the topic map for which targetNode is the object

surgically substitute reference to virtualNode for targetNode

Note the assumption in reWireNodeGraph: all links to merged nodes will occur only in tuples (relations). That may not turn out to be the case, but is thought to be the case in the present platform. That design assumption may have to be revisited eventually.

References

Bleier, Arnim, Patrick Jahnichen, Uta Schulze, Lutz Maicher (2010). " The Praxis of Social Knowledge Federation". In Dino Karabeg and Jack Park (eds). Knowledge Federation 2010 Self-Organizing Collective Mind, Second International Workshop on Knowledge Federation Dubrovnik, Croatia, October 3-6, 2010. Paper online at http://ceur-ws.org/Vol-822/AB.pdf

Garshol, Lars Marius, and Graham Moore (Editors) (2008). “Topic Maps—Data Model.” Online at http://www.isotopicmaps.org/sam/sam-model/

Kivelä, Aki (2010). “Introduction to Layered Topic Maps”. Online documentation for the open source Wandora topic map platform. Online at http://www.wandora.org/wandora/wiki/index.php?title=Introduction_to_Layered_Topic_Maps

Park, Jack (2010a). " Boundary Infrastructures for IBIS Federation: Design Rationale, Implementation, and Evaluation". Thesis Proposal. Paper online at http://kmi.open.ac.uk/publications/techreport/kmi-10-01

Park, Jack (2010b). "Topic merge scenarios for knowledge federation". In: Maicher, Lutz and Garshol, Lars Marius eds. Information Wants to be a Topic Map: Revised Selected Papers. Leipzig: Universit¨at Leipzig, pp. 143–154. Paper online at http://oro.open.ac.uk/23944/1/Park-TMRA2010.pdf

Schulz, Uta (2010). "Hatana - Virtual Topic Map Merging". TMRA 2010. Slides online at http://www.slideshare.net/tmra/hatana-virtual-topic-map-merging



[1] Topic map: http://en.wikipedia.org/wiki/Topic_Maps
[2] Atomic Update: http://wiki.apache.org/solr/Atomic_Updates

CONTEXT(Help)
-
OpenSherlock Project »OpenSherlock Project
Architecture Ideas »Architecture Ideas
Thesis 1: Couple SolrSherlock to a Topic Map »Thesis 1: Couple SolrSherlock to a Topic Map
Basis for this thesis? »Basis for this thesis?
Topic maps need merge engines »Topic maps need merge engines
Virtual Merge Platform
MergeAgent »MergeAgent
+Comments (0)
+Citations (0)
+About