Did you know that cfsearch allows you to search against multiple collections? As long as you aren't searching against categories you can search against as many collections you want. You simply add them to the COLLECTION attribute and go for it. However, a reader noticed something odd. His results were sorted by collection, not be score. So all the results for the first collection were returned in the results first followed by results in the second query. To test this myself I created two collections. The first contained all the HTML files from the ColdFusion docs. I called this collection cfref. I then created another collection of the Word docs from CFWACK. These are my own copies and are just a small part of the book but I thought it would give me nicely similar content to search against. I then tried the following code:
<cfsearch collection="cfref,cfwack" criteria="cflog" name="results" maxrows=20>
<cfdump var="#results#" show="score,title">
Which game me...
As you can see, the Score resets back up for the second collection. Not good. So I suggested a simple query of query. That should work, right?
<cfquery name="newresults" dbtype="query">
select title, score
from results
order by score desc
</cfquery>
Nope. Didn't work at all. On a whim I looked at the metadata for the query and saw this:
See the issue? The score is being returned as varchar. Heck, all the columns are, even recordssearched and size. That's definitely a bug. (I'll file a report for that in a minute.) Luckily the fix is easy enough - just cast your result.
<cfquery name="newresults" dbtype="query">
select title, cast(score as decimal) as realscore
from results
order by realscore desc
</cfquery> <cfdump var="#newresults#">
Which gives us...
P.S. So this is interesting. Notice my use of maxrows? What happens if I use maxrows=5 against the data I mentioned above? I get 6 rows. Apparently the max applies per collection. On one had this is good - you are guaranteed to get the best results back. On the other hand if you just want N rows total, you are kind of screwed. Once again though a query of query will help.
Archived Comments
I reported the maxrows bug a few months ago http://cfbugs.adobe.com/cfb... but I hadn't noticed the score as varchar problem.
I also find the suggestions feature behaves oddly when querying multiple collections. It does seem as if the Adobe engineers didn't think through the <cfsearch> "api" very thoroughly when trying to replicate what was supported with Verity.
Until this is fixed I've tended to consolidate my collections so there's just one, and use filtering where possible. Fortunately Solr's superb performance makes this a lot easier than with Verity, which I could never get to handle large collections reliably (in fact it was rarely reliable full stop).
I tend to agree with Julian though it's perhaps a little harsh to call out the engineers for not thinking through thoroughly, cfsearch is great for straightforward scenarios, just perhaps over simplified for anything more involved.
I recently had the fortune to play with using Solr directly with ColdFusion and having done that, the idea of querying against multiple collections doesn't make much sense, it is only necessary here because of cfsearch's simplicity. Great to see the Jedi offering solid workarounds though ;)
I've started to blog my experience of using Solr directly (it's a real eye opener) here:
http://fusion.dominicwatson...
If nothing else, I'd recommend everyone running through the Solr quick start tutorial here:
http://lucene.apache.org/so...
d
Hi Dominic
I'm not sure it is overly harsh to blame the engineers.The "maxrows" attribute isn't particularly advanced usage, and some simple tests would surely have confirmed that it was working as per the docs.
But we've all made this kind of omission on tight deadlines, and kudos to the CF team for listening and reacting far more quickly and openly than in the past. They do give the impression of really wanting fix issues like this which is great.
Really looking forward to reading your blog, Dominic. I've been quite impressed with Solr and would love to go in deeper as you've obviously started doing.
Actually, if you use the Solr web service you can have multiple collections, maxrows, and sorting without the need to grab the full set of results and do a query of queries after the fact. The URL (assuming the same criteria as in the initial post) would look something like this:
http://localhost:8983/solr/cfref/select/?shards=localhost:8983/solr/cfref,localhost:8983/solr/cfwack&q=cflog&rows=20&fl=*,score
(Add a sort parameter to change the sort -- you can use score or any other field that is indexed and not multiValue.)
I heartily recommend this approach as it allows the developer to change the sort and still grab only a set # of results.
FYI - if you add a sort parameter you have to specify ascending or descending as well (no default). So you might add a sort parameter as follows: &sort=score desc .
As just an FYI, this is still a bug in ColdFusion 10.