Using an Adjacency Map to match Multi-word Phrases
http://sujitpal.blogspot.com/2011/09/using-adjacency-map-to-match-multi-word.html I recently run our entire taxonomy of approximately 1 million medical concepts through my UIMA Aggregate AE for taxonomy mapping described here, and it took 3 weeks. That's right, 3 weeks.
After I was done questioning my programming skills (or lack of it), I began wondering where all the time was being spent. Almost off the bat, I discovered that I had made the newbie mistake of not reusing cursors when reading from the database (its been a while since I've written straight JDBC code), resulting in the code opening and closing each cursor (some up to 20 times) for each of the 1M concepts. Still, that alone could not explain the long run time, so the next candidate was the UIMA AE itself.
Back in my CNET days, over one very late night, I learned to profile applications by inserting stopwatch calls into (my slow) code, and the lesson has stuck (thanks Adam :-)). I wanted to do the same thing here, ie, for a aggregate AE (consisting of a fixed flow of primitive AEs), I wanted to find the time taken within each primitive AE - then I could identify the AEs that needed improvement.