Posted in ColdFusion | Posted on 06-20-2008 | 3,598 views
I just pushed up an update to Seeker, my ColdFusion Lucene project. I added support for MS Word documents and MS Excel files. This was incredibly easy using JavaLoader from Mark Mandel and the POI project.
Todd Sharp gets credit for pushing both these ideas to me. He also made a good suggestion for how to use JavaLoader within Seeker.
Seeker makes use of various "reader" CFCs. Each CFC is responsible for one or more file types. A CFC 'registers' itself using metadata. So here is what plaintext.cfc looks like:
2
3<cffunction name="read" access="public" returnType="string" output="false">
4 <cfargument name="file" type="string" required="true">
5 <cfset var result = "">
6
7 <cffile action="read" file="#arguments.file#" variable="result">
8 <cfreturn result>
9
10</cffunction>
11
12</cfcomponent>
Note the extensions attribute. This then says that this reader will be used for all the plain text file types. So what Todd suggested was just using a similar method for the Java classes. I'm not terribly happy with the names, but this is what I did.
When you add requires= to your reader CFC, you specify a list of Java classes. Like so:
(Spaces were added to me.) When Seeker runs, it will notice these requirements and use JavaLoader to load them. There is a JARs file that is autoloaded, and it is expected that if your CFC needs a jar, you will put it in the folder. Since I'm using JavaLoader, all of these JARs are plug and play. No need to restart ColdFusion. Working with the classes is simple as well:
This calls a method in the inherited CFC that gets the class that was loaded by JavaLoader and injected by the core Seeker code. I'm not happy with that method name there, but it works.


@Nick - Rough thoughs:
So obviously if you aren't on a Mac, then Verity is built in. No need to 'install' Seeker. I also like the category and suggestions support for Verity. You could probably duplicate category support, but it would be more difficult to do suggestions. (As far as I know, I'm still learning Lucene.)
The big plus for Lucene is index size. You have no license limits like Verity (250k). As I blogged about earlier, I tested w/ an index of 25 million records.
does seeker work with Coldfusion 7?
<a href="seeker/index.cfm" target="content">Seeker</a><br> to it. Is the XML the new approach?
Any chance you could add snippets on the next version (similar to verity where it highlights text etc).
Would also be cool if you could link to pages within a framework...although I'm baffled how one would accomplish this.
Keep up the good work! Being on a mac, I'm finding this tool invaluable!
Also, you need to clearly differentiate between snippets and context. A snippet could be from anywhere in the document, but helps identify the document, whereas context shows you the match. So I'm sure you mean context.
im wondering if seeker allows for more than one index to be created?
i currently need to index lots of different tables with different columns. Instead of trying to collate them into one index, thought it may be easier to create more than one index?
thanks
if i have this...
<cf_indexquery directory="#index_folder#" indexdirectory="#index_folder#"
query="#arguments.index_qry#"
storecolumns="id,title,content,type,link" indexcolumns="id,title,content">
where would the name of the index file go?
seeker has been a lifesaver as im on a mac.
thanks
It doesn't seem like stemming is working when I use Seeker. Is there something I need to do to get it working?
Thanks
what are you doing with MS word math equations? could you insert formula to database via rich text box or reading from doc files?
Please help me.
regards
farshid
Maybe a stupid question but how would I search 2 indexes at the same time, one a file index the other a DB index? Or can I join the two together?
Thanks
Peter
[Add Comment] [Subscribe to Comments]