Configuring Solr
Dealing with Solr's solrconfig.xml file to make it compatible with SolrSherlock

What

Solr has at least two configuration files with which we must deal in order for the entire SolrSherlock ecosystem to function properly. They are:

  • schema.xml, which defines the index fields (subject of another node here)
  • solrconfig.xml, which defines how Solr behaves, and, more related to SolrSherlock, how SolrSherlock can plug into Solr.
This task is to make adjustments to solrconfig.xml in order to satisfy the needs of SolrSherlock and the topic map itself.

Specifically, we add the following elements to solrconfig.xml
  • A modified update chain which injects our SolrInterceptor into Solr's update processing
  • A different modified update chain which does not inject our interceptor.
We will describe a specific way to design a Solr Client to work with these changes here.

Note: we will be updating this document as we add new features to the SolrSherlock ecosystem. For instance, we expect to create custom queries for Solr. These will include:
  • An isA query, one which asks if object A is a subclass or instance of B; this is a complex, recursive query which is better handled locally rather than remotely, much like a stored procedure.
  • A joined view query, one which assembles a more complete topic-centric view of a given topic; this is an iterative query, better handled locally.

 

Why

Solr is a configurable platform, one capable of being modified or extended. SolrSherlock extends Solr's capabilities. In the present context, there are two specific classes configuration changes necessary:

  • Additional update handlers, one to inject our SolrInterceptor, and one which does not inject SolrInterceptor
  • Addition of an update log to the basic update handler.

Our Update Handlers

All new documents sent into Solr need to be indexed -- the usual Solr behavior -- but they also need to be examined by our agent platform. For that reason, we need a particular handler in the update context of querying Solr which adds our agent platform's listener to the list of things Solr does with a new document.

Our agent platform might make changes to a new document according to the needs of the topic map. This means that the revised document will be sent back to Solr for re-indexing, but not to be handled by the agent platform again. Thus, we need a second class of update handler which does not add the agent platform in the list.

An Update Log

In the Solr ecosystem, there are two ways to change a document -- that is, a collection of information resources Solr calls fields and their values:

  • An overall re-write of the document, which is subject to version numbers. 
  • An atomic update where individual fields within a Solr document can be surgically changed. Another term for atomic update is partial update.
When an overall re-write is used, one sends back to Solr an entire document with changes made. The document now includes a version number, assigned to it by Solr when it was first saved and indexed. If the version numbers agree, then the update will succeed. If not, a optimistic locking failure will occur. This means that the document just edited is stale, that someone else did an edit after you fetched your copy.

Our task is to perform surgical changes, mostly of a very minor kind, to documents sent into Solr. A typical scenario is that of merging two documents: they are both about the same topic. An example occurs when more than one document is about the same URL (web page). They belong together. Our merge strategy is to create a new document which represents the union of the two documents, then link them to that new document. Our surgical changes to the two merged documents is simply that of adding the connection.

How

Here is the code we added to solrconfig.xml

The name values are important. We use those names when we configure our Solr Client to know which to use.

Our two update chains

<updateRequestProcessorChain name="harvest" default="true">

      <processor class="solr.RunUpdateProcessorFactory"/>

      <processor class="org.apache.solr.update.TopicQuestsDocumentProcessFactory">

        <str name="inputField">hello</str>

      </processor>

      <processor class="solr.LogUpdateProcessorFactory"/>

</updateRequestProcessorChain>


<updateRequestProcessorChain name="partial" default="true">
      <processor class="solr.RunUpdateProcessorFactory"/>
      <processor class="solr.LogUpdateProcessorFactory"/>
</updateRequestProcessorChain>

The Update Log

We modify the standard "/update" handler by adding an updateLog declaration.

  <requestHandler name="/update" 

                  class="solr.XmlUpdateRequestHandler">

    <updateLog>

      <str name="dir">${solr.data.dir:}</str>

    </updateLog>

  </requestHandler>


PAGE NAVIGATOR(Help)
-
OpenSherlock Project »OpenSherlock Project
How To ? »How To ?
How to build and develop the code »How to build and develop the code
Configuring Solr
+Commentaar (0)
+Citaten (0)
+About