Last night I released an update to Seeker, my ColdFusion Lucene wrapper. A user, Casey, of Dealtree.com, contacted me about a possible speed improvement to help with larger indexes. I tried his suggestion, and now the search API should be a bit quicker. I also added support for pagination (you can get from N to M results) and metadata results (how many matches exist). Lastly, there is an optimize custom tag now as well.
What may interest folks, especially those who don't like the size limitation of the Verity engine bundled in ColdFusion, is the size of Casey's collection. Would you believe he had 25 million records? Is being able to support a 25 million record index something that may be useful to folks? Oh, and the speed is pretty darn nice too. To search those 25,000,000 records, it takes approximately 250ms to 1s. Not bad I'd say. (Although to be clear, all the credit goes to the kick butt Lucene engine.)
Archived Comments
Wow... this is turning out to be quite nice!
I also found a couple other things..
We are creating indexReader, but not using it.
You can cache the searcher, analyzer, and parser to avoid recreating them every search, but you have to recreate the hitcollector every search.
Good call on indexReader.
As for the others, I don't want to assume app scope since a person may not be using it within an application. Very unlikely, I know, but still. In my tests the object creation was -very- quick, like 0-5ms.
another thought on the indexquery side, any reason why it nukes the index each time?
should there be a flag not to reinit the index so that you can just add records
oh and then maybe use the UpdateDocument instead of addDocument so your primary key stays unique...
just a thought...
The ability to add a key, edit a key, and delete a key, is something I need to add. Frankly I just forgot, but thats really the last 'pieces' of functionality that I've left out. I'll try to get it this week.
Hi Ray, does this wrapper work in a shared hosting environment? I don't have access to the CF 8 Administrator and cannot request anything that cannot be served to all clients on the shared server.
It depends on what's locked down. The latest version uses JavaLoader to load the Java files, so nothing needs to be copied to the classpath. The CF Admin stuff obviously won't work for you, but you don't need to have it.