Some criticisms on Solr in ColdFusion 9
Recently Ryan Stille (one of the new ColdFusion ACPs) posted a comment on my blog entry, Some Basic Solr/Verity Differences. In that comment he pointed out that he was noticing differences in results returned by Verity and Solr. No big surprise there - but what was surprising was the lack of data returned by Solr. Spurred on by his comment I did some testing of mine and I have to say - I'm pretty disappointed. What follows are some findings in regards to testing file based collections in Solr and Verity. I'll point out that all of this has been brought to Adobe, so I'm not just complaining but actively trying to improve the problem for ColdFusion 9.X.X (i.e., whatever comes next).
Before going further, please be sure you note the qualification I made above. These issues refer to file based collections of data. In other words, cases where you ask Verity/Solr to index files, like Word Docs, PDFs, and other binary formats. It does not refer to a collection that is built from your database.
For my testing I used Windows XP and a folder of 8 documents. This folder included 1 MP3, 4 PDFs, 2 Word docs, and one text file. I indexed both using the ColdFusion Administrator. My tests were done using CF Admin Searcher, a ColdFusion Admin extension that lets you perform ad hoc queries against Verity and Solr collections. I basically opened the tool up in two tabs and performed the same search in both to do my comparisons. Here is what I found:
1) The Summary field in results for Solr contained binary "junk". Verity cleaned this up. Example:

That result came from a MP3, which you expect to be 'dirty', but Verity correctly clears this from the summary. I also see these chars in PDF files as well. Word docs seemed fine though.
2) Solr failed to pick up the TITLE value for any binary file. Verity got them all. I also saw this in other metadata fields. Solr also missed the author field for example.
3) Solr failed to return any context values for results while Verity had no trouble. You should note that you have to perform a file edit to enable context with Solr (details here: http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSe9cbe5cf462523a0-5bf1c839123792503fa-8000.html) but even with this change there was no additional text in my context.
4) While I wouldn't classify this as serious testing in any way, in every test, Verity searches were about twice as fast. Now we are talking about 15 versus 30 ms, which is not something to be concerned about, but Solr was supposed to be quite a bit fasting. (To be fair, my test suites are so small as to not really be relevant.)
5) The TYPE value for results is correct for Verity, but comes back as application/octet-stream for Solr. You don't need this column of course, you can sniff the extension, but still...
All in all, this is disappointing. I don't think I can recommend Solr (specifically, Solr as bundled in ColdFusion 9) for production use... at least for a collection that is heavily file based. You can, of course, do post-processing of search results to get metadata. ColdFusion 9 supports getting metadata from Word docs now, but you have to convert it to PDF first. That's something you would definitely want to cache though.
As I said, both Ryan and I shared our findings with Adobe so I'm sure it will get corrected in the future. People making a decision about search support should consider carefully though. I don't think anyone thinks Verity will last much longer. Solr is definitely the future. But we've got a few bumps in the road to get past first.

I *really* need to clean up my cfsolr library and get that released :(
We've gone the standalone solr route and couldn't be happier. We use Tika for binary text extraction and don't have the issues described above.
http://cfsolrlib.riaforge.org/
I'd be interested in a followup post.
CF901 blog post: http://www.coldfusionjedi.com/index.cfm/2010/7/13/...
a) Are you running 901+the latest hot fixes?
b) Did you remember to _ask_ for context with the contextPassages attribute?
c) Did you reindex your data?
1. /opt/coldfusion9/lib/updates/hf901-00002.jar
2. <cfsearch collection="#coll#" name="Getresults" startrow="#startRow#" maxrows="#maxRow#" contextpassages="1"
contextbytes="500" contexthighlightbegin="<strong>" contextHighlightEnd="</strong>"
criteria="#searchFor#" suggestions="Always" status="info" language="English" >
3. Not explicitly. Each time the scheduled task that builds the collection (weekly) runs, it purges the collection and then rebuilds it. I assumed that indexed it so that it would never need to be reindexed per se.
Our team has been in the process of moving our site search from Verity to Solr. We are running CF9. I am able to build the collections and search them but have run into a problem. Hopefully, someone can shed some light on a solution. We are using cfincludes which can be either cfm or html files. When the search results are displayed and clicking on one of the links that happen to be one of the included files, none of the css is applied which is usually displayed from the file containing the cfinclude. Is there any way to have Solr display the link to the file that contains the cfinclude with all the CSS displayed properly as opposed to the included file?
I thought i had read a few months ago (most likely longer) that password encrypted pdfs were an issue.
Garth
In the mean time i found on http://wiki.apache.org/solr/ExtractingRequestHandler
some information regarding this. :)
It appears that you can not with the version (1.3) that comes with CF 9.
I tried to index zip files via the CFadmin interface. This errored out.
My thought was same as above.... maybe i needed to add something to the config file.
I haven't found anything yet on http://wiki.apache.org/solr, but will keep looking.
Thanks