Twitter: raymondcamden

Address: Lafayette, LA, USA

Simple ColdFusion 9 ORM/Solr Example

08-20-2009 21,738 views ColdFusion 86 Comments

Last night I decided to whip together a simple example of how to add Solr search indexing to an application. Luckily, for the most part, this is the exact same process we've been using for years now with Verity. I know many people avoided Verity due to the document size limits so with that in mind, I thought a simple ColdFusion 9 example would help introduce the feature. To start off with, let me show you a simple application that has no search capability at all. This will be the first draft application that I'll modify to add Solr support.

My application is a Press Release viewer. The public page consists of a list of press releases. You click on a press release to view the details. The admin folder (and for this proof of concept it won't have any security) allows for basic CRUD operations. I won't show most of the code as it's rather boring, but I'll demonstrate my Application.cfc and the model layer. First, the Application.cfc file:

view plain print about
1component {
3 = "pressreleases";
4    this.ormenabled = true;
6    this.ormsettings = {
7        dialect="MySQL",
8        dbcreate="update",
9        eventhandling="true"
10    };
11    this.mappings["/model"] = getDirectoryFromPath(getCurrentTemplatePath()) & "model";
13    public boolean function onApplicationStart() {
14        application.prService = new model.prService();
15        return true;
16    }
18    public boolean function onRequestStart(string page) {
19        if(structKeyExists(url, "init")) { ormReload(); applicationStop(); location('index.cfm?reloaded=true'); }
20        return true;
21    }

Nothing too fancy here - I've enabled ORM, allowed for easy restarts, and created a grand total of one CFC in the application scope, the prService. The prService is simply a component to abstract access to my press release model. The press release entity is just:

view plain print about
1component persistent="true" {
3    property name="id" generator="native" sqltype="integer" fieldtype="id";
4    property name="title" ormtype="string";
5    property name="author" ormtype="string";
6    property name="body" ormtype="text";
7    property name="published" ormtype="date";

And the service provides an abstraction layer to it:

view plain print about
1component {
3    pubic function deletePressRelease(id) {
4        entityDelete(getPressRelease(id));
5        ormFlush();
6    }
8    public function getPressRelease(id) {
9        if(id == "") return new pressrelease();
10        else return entityLoad("pressrelease", id, true);
11    }
13    public function getPressReleases() {
14        return entityLoad("pressrelease");
15    }
17    public function getReleasedPressReleases() {
18        return ormExecuteQuery("from pressrelease where published < ? order by published desc", [now()]);
19    }
21    public function savePressRelease(id,string title,string author,date published,string body) {
22        var pr = getPressRelease(id);
23        pr.setTitle(title);
24        pr.setAuthor(author);
25        pr.setPublished(published);
26        pr.setBody(body);
27        entitySave(pr);
28    }

I assume most of this makes sense. Note that I have bot ha getPressReleases function as well as a getReleasedPressReeleases function. The later handles the public view and only gets press releases where the published date is in the past. Notice that savePressRelease is kind of nice - it just plain works whether you have a new press release or an existing one. Also make note of delete. In order to handle calling a delete operation followed by a list, I force a flush on the ORM stuff. If I didn't, the deleted item would show in the list during the same request.

You can download all of this code at the bottom, and again, I don't want to waste too much time on basic list/edit forms. What I want to talk about instead is the process of enabling Solr searching support for this application.

When you work with Solr (and Verity as well), you work with an index of your data. This index, much like an index in a book, represents all the data that you want to be searchable. However, and this is the critical point, it is your responsibility to keep the index up to date. That means every time you add, edit, or delete content, you have to update the index. The maintenance aspect then is typically the most complex part of the process. Searching really just comes down to one tag.

I normally create a "Ground Zero" type script that handles creating my collection and index from scratch. (Think of the collection just as the folder or name of the index.) This is useful to run during testing or if you encounter a bug where your index gets out of data. I created the following script for that purpose:

view plain print about
1<cfcollection action="list" name="collections" engine="solr">
3<!--- collection check --->
4<cfif not listFindNoCase(valueList(, application.collection)>
5    <cfoutput>
6    <p>
7    Need to create collection #application.collection#.
8    </p>
9    </cfoutput>
11    <cfcollection action="create" collection="#application.collection#" engine="solr" path="#application.collection#">
14<!--- nuke old data --->
15<cfindex collection="#application.collection#" action="purge">
17<!--- get data --->
18<cfset prs = application.prService.getPressReleases()>
19<cfoutput><p>Total of #arraylen(prs)# press releases.</p></cfoutput>
21<!--- convert to a query --->
22<cfset data = entityToQuery(prs)>
24<!--- add to collection --->
25<cfindex collection="#application.collection#" action="update" body="body,title" custom1="author" title="title" key="id" query="data">

I begin by getting a list of collections. The ColdFusion 9 docs say that if you leave the engine attribute off the cfcollection tag it will return everything. I did not see that. I file a bug on it. But for now, I've just added the engine attribute. This returns a query of collections. If I don't find my collection in there (I created an application variable to store the name) then I create one. In theory, this will only happen one time.

Next I remove all data from the collection with the purge. Again, I'm thinking that this script would be useful both for a first time seeding of the index as well as a 'recovery' type action.

Once we have an empty index, I get all of my press releases and convert it to a query with the entityToQuery function.

Lastly, I simply pass that query to the cfindex tag. Now, here is an important part. When you pass data into the index, you get to the decide what gets stored in the body and what, if anything, gets stored in the 4 custom fields. I decided that the body and title made sense for the searchable information. I repeated title again for the title attribute. This will let me get the title in search results. For the custom field I used the author. Again, this was totally up to me and what made sense for my application.

Alright, so at this point we can run the script to create our collection and populate the index. I then switched gears and worked on the front end. I create a new search template to handle that:

view plain print about
1<cfparam name="" default="">
2<cfparam name="" default="">
3<cfset = trim(>
5<form action="search.cfm" method="post">
6<cfoutput><input type="text" name="search" value=""> <input type="submit" value="Search"></cfoutput>
9<cfif len(>
10    <cfsearch collection="#application.collection#" criteria="" name="results" status="r" suggestions="always" contextPassages="2">
12    <cfif results.recordCount>
14        <cfoutput>
15        <p>There were #results.recordCount# result(s).</p>
16        <cfloop query="results">
17        <p>
18        <a href="detail.cfm?id=#key#">#title#</a><br/>
19        #context#
20        </p>
21        </cfloop>
22        </cfoutput>
24    <cfelse>
26        <p>
27        Sorry, but there were no results.
28        <!--- trim is in relation to bug 79509 --->
29        <cfif len(trim(r.suggestedQuery))>
30            <cfoutput>Try a search for <a href="search.cfm?search=#urlEncodedFormat(r.suggestedQuery)#">#r.suggestedQuery#</a>.</cfoutput>
31        </cfif>
32        </p>
34    </cfif>

Going line by line, we begin with some simple parameterizing of a search variable, along with a basic form. If the user actually searched for something, we use cfsearch. As you can see, it works pretty simply. Pass in a criteria and a name for the results and you are done. The status attribute is not necessary but provides some cool functionality I'll describe in a bit.

If we have any results, I simply loop over them like any other query. The context is created by Solr based on your matches. So if you searched for enlightenment (don't we all), then the context will show you where it was found in the data.

The cool part is the else block. Solr (and Verity before it) provided a nice feature for searches called suggestions. Let's say a user wanted to search for Dharma but accidentally entered Dhrma. In some cases, the Solr engine can recognize the typo and will actually return a suggested query: Dharma. Pretty cool, right? Please note that the trim in there is due to another bug I found. In cases where Solr could not find a suggestion, it returned a single space character. I'm sure this will be fixed for the final release. If we do get a suggested query then we simply provide a link to allow the user to try that instead.

So far so good. Now let's talk about keeping the index up to date. If you remember, I had built a simple service component, prService, to handle all CRUD operations for my data. Because I did that, it was rather simple to handle the changes necessary for my index. First, my Application.cfc onApplicationStart was modified to support passing in the collection name:

view plain print about
1public boolean function onApplicationStart() {
2    application.collection = "pressreleases";
3    application.prService = new model.prService(application.collection);
4    return true;

And then prService was modified to support it. Unfortunately, there are no script based alternatives for Solr/Verity support. To be honest, it would probably be trivial to create such a component. (In case you didn't know, the ColdFusion 9 script based support for mail, and other things, was done this way.) I ended up simply rewriting my component into tags:

view plain print about
1<cfcomponent output="false">
3    <cffunction name="init" output="false">
4        <cfargument name="collection">
5        <cfset variables.collection = arguments.collection>
6    </cffunction>
8    <cffunction name="deletePressRelease" output="false">
9        <cfargument name="id">
12        <cfset entityDelete(getPressRelease(id))>
13        <cfset ormFlush()>
15        <!--- update collection --->
16        <cfindex collection="#variables.collection#" action="delete" key="#id#" type="custom">
18    </cffunction>
20    <cffunction name="getPressRelease" output="false">
21        <cfargument name="id">
23        <cfif id is "">
24            <cfreturn new pressrelease()>
25        <cfelse>
26            <cfreturn entityLoad("pressrelease", id, true)>
27        </cfif>
28    </cffunction>
30    <cffunction name="getPressReleases" output="false">
31        <cfreturn entityLoad("pressrelease")>
32    </cffunction>
34    <cffunction name="getReleasedPressReleases" output="false">
35        <cfreturn ormExecuteQuery("from pressrelease where published < ? order by published desc", [now()])>
36    </cffunction>
38    <cffunction name="savePressRelease" output="false">
39        <cfargument name="id">
40        <cfargument name="title">
41        <cfargument name="author">
42        <cfargument name="published">
43        <cfargument name="body">
45        <cfset var pr = getPressRelease(id)>
46        <cfset pr.setTitle(title)>
47        <cfset pr.setAuthor(author)>
48        <cfset pr.setPublished(published)>
49        <cfset pr.setBody(body)>
50        <cfset entitySave(pr)>
52        <!--- update collection --->
53        <cfindex collection="#variables.collection#" action="update" key="#pr.getId()#" body="#pr.getBody()#,#pr.getTitle()#" title="#pr.getTitle()#" custom1="#pr.getAuthor()#" type="custom">
55    </cffunction>

If we ignore the tags, the only changes are the cfindex tags in deletePressRelease and savePressRelease. In both cases it isn't too difficult. The key attribute refers to the primary key in the index. We used the database ID record so it's what we use when updating/deleting. The update action works for both additions and updates, so that is pretty simple as well.

Unfortunately, I ran into an issue with deletes. Delete operations are 100% broken in the current release of ColdFusion 9, at least on the Mac (and I bet it works ok in Verity). Keep this in mind as you play with the demo code. I've been told this is fixed already.

So what do folks think? Will you use this when you upgrade to ColdFusion 9? Also, have you notice the slight logic bug with search? I won't say what it was - but I'll tackle it in the next post.

Download attached file

Related Blog Entries


These comments will soon be imported into Disqus. To add a comment, use Disqus above.
  • Commented on 08-20-2009 at 1:27 PM
    Great example, thanks. How would you modify CF8WACK Vol-2 Pg-463 Listing 39.13 - "Combining Verity Searches with SQL Queries on the Fly", if you still need to simultaneously get additional data out of the model when you run your cfsearch?

    Isn't a "slight logic bug" illogical?
  • Commented on 08-20-2009 at 1:59 PM
    Now show an example of how to index more complex content. Say your pressreleases object did a many-to-many join to a contacts table, and a one-to-one join on author. How can you index that?
  • Commented on 08-20-2009 at 2:05 PM
    @Art: The key result from the search is the same as the PK in the database, so I could use the CF9 entity functions if I needed "more" then I could index. Don't forget we have both 4 custom fields as well as categorization we can use as well (which could be - kinda - two more fields).

    @Shannon: To be honest, that isn't too exciting. I was able to skip quite a bit with one call: entityToQuery, but if I couldn't do that, then I'd simply loop over and make a query by hand. I could do N cfindex calls as well, but that tends to be slow. If folks do feel a more complex example would be warranted, then I can definitely consider it for the next post.

    So does anyone see the security error with the search?
  • Daniel Budde #
    Commented on 08-20-2009 at 2:26 PM
    Personnally I started avoiding using Verity because of the resource hog it became in CF7+. I tried seperating out the verity server from the CF server, but I was never able to get it to function well. Do you know if Solr performs any better than Verity? I am currently looking for performance information, but if anyone knows anything, I would be glad to hear it.
  • Commented on 08-20-2009 at 2:32 PM
    I've not done any testing yet, sorry. I can say I did a "large index" test with Seeker, my Lucene wrapper for CF8. It was able to search a multimillion index pretty darn fast (I think it may have been 20 million even - but not sure).
  • Commented on 08-20-2009 at 4:07 PM
    Thanks for this post. I think SOLR and ORM are the two things I'm most looking forward to in CF9. We already have plans to upgrade our servers.
  • david buhler #
    Commented on 08-21-2009 at 10:25 AM
    "Unfortunately, there are no script based alternatives for Solr/Verity support."

    When I saw Ben Forta speak with Adam Lehman at NYCFUG, it was my understanding that every tag but 1 (I forget which one) would be available in CFScript syntax.
  • Commented on 08-21-2009 at 10:29 AM
    Unfortunately this is not true. Some new things are being added - for example, cfdirectory support was fixed post public beta - but as far as I know, it will NOT be 100%. I've asked Adobe to ensure they carefully document what can and cannot be done in CFS.
  • david buhler #
    Commented on 08-21-2009 at 10:45 AM
    Boo. Then again, I applaud their willingness to keep pushing in new features in smaller releases, even if it's not 100%.
  • pat branely #
    Commented on 08-21-2009 at 10:09 PM
    Hi Ray

    how well does this work with verity/k2 ?

    from my experience there is a massive performance hit when calling CFindex using verity and updating only 1 record. since moving from CF6 to cf7/8 we have had to re-structure our apps to CFINDEX via a schedule and pass a query with a large number of records to cfindex. ie. cfindex with 1 record = 10 seconds. cfidnex passing 100 records = 10 seconds.

    Does CF9 bring back the the vdk style of updating your index on save ? from your example - It looks like it.
  • Commented on 08-22-2009 at 9:07 AM
    Wow, I've never seen Verity that slow. I've certainly seen it go slow if you did a lot of singular updates instead of a large query at once, but for atomic operations, it always went fast for me. To be fair, it has been a while since I used Verity. I did it last for CFCookbook, but I never saw slowness when editing content.

    To your last question, I'm not sure you mean by 'vdk style of updating' - but - as far as I know, this code should just plain "work" if you switched from Solr to Verity - in fact, I'm willing to bet the delete bug doesn't exist on the Verity site.
  • pat branely #
    Commented on 08-23-2009 at 3:43 AM
    VDK in 6.1 allowed atomic updates - ie save a record update the index with the changes of that record. it might take a little extra time but nothing noticeable on save

    K2 in 7/8 would take seconds to update just 1 record in the index. it was so slow for us we had to deferr indexing out of the save operation on records and into a schedule task.

    anyways im excited to see this solr example in CF9
  • Commented on 10-15-2009 at 2:23 AM
    Hi Ray,

    Have you tried to use custom1 in cfsearch with solr?
    <cfsearch collection="#arguments.collection#" type="simple" criteria="CF_CUSTOM2 <matches> #newResId#" name="qTemp" />
    Throws an error for me.

    CF9 doc say custom1 .. 4 are for verity only
  • Commented on 10-15-2009 at 6:47 AM
    I thought that I had and that it worked. What error do you get?
  • Commented on 10-15-2009 at 7:36 AM
    Looks like the problem is actually a corrupt PDF.

    However, one corrupt PDF through an error,
    but another one just hangs the systems???
  • Commented on 10-15-2009 at 7:42 AM
    So it sounds like you are saying 2 things her.e

    1) A bad PDF that gets indexed causes the entire collection to stop working. Right?

    2) It also sounds like another bad PDF caused the server to hang. Right?

    Please confirm as to me it sounds like 2 separate issues.
  • Commented on 10-15-2009 at 7:56 AM
    The process does a cfindex update and then a cfsearch. The collection is not getting corrupted.

    There are (at least) two corrupt PDF files to be added to the collection. One throws and error (which can be handled with try/catch). The other is hanging CF - no error thrown, not even a timeout.

    I took out cfsearch as I thought that was causing the error - but still hangs.

    I added cfpdf getinfo (removing cfindex) and that hangs the system on the same file.

    I will do more testing tomorrow when I am back in the office and let you know what I discover.
  • Commented on 10-15-2009 at 8:49 AM
    Where is the error - is it in the cfindex or the cfsearch? You said you took out the cfsearch but it still hanged - but is that for the "Hang" PDF or the "Error" PDF? (It seems like one pdf causes an error, one a hang.)
  • Commented on 10-15-2009 at 10:56 PM
    added a bug

    but not sure how to supply my test code and files
  • Commented on 11-09-2009 at 7:55 PM
    I'm starting to learn the new syntax.
    So, instead of:

    <cfset Application.prService = CreateObject("component","Model.prService")>
    we now say:
    this.mappings["/model"] = getDirectoryFromPath(getCurrentTemplatePath()) & "model";
    application.prService = new model.prService();
  • Commented on 11-09-2009 at 8:27 PM
    Not exactly. Typically a "service" component isn't an entity. An entity normally represents one row of data, or one "instance", so one person. A service component typically works with data as a whole. So my userService, for example, is my main API to get users. The userService may return user entities.

    Make sense?

    To be clear, CF ORM doesn't really DEMAND you follow any particular type of way of coding. So don't take what I say as the One True Way.
  • Commented on 06-02-2010 at 9:13 PM
    custom1, custom2, custom3, custom4 work with Solr?? The documentation said they're only for Verity <MATCHES> operator. How to use with customX with Solr?

    Thank you!
  • Commented on 06-03-2010 at 6:17 AM
    You can use them to store additional data. This can be used when displaying the results.
  • Commented on 06-03-2010 at 6:24 AM
    You can search against custom fields using a colon operator:

  • Commented on 06-03-2010 at 11:58 AM
    awesome, thanks!
  • Fabio #
    Commented on 08-04-2010 at 4:04 AM
    A little dummy question about coldfusion + solr search engine.
    I am indexing hundreds of .htm docs with a 'question' in the <title> and the 'answer' in the <body>.
    1) Will Solr prioritize the title in my cfsearch? I mean, is the title more important for the engine, isn't it?
    2) And how can I add one or more categories/tags to my doc in the index, e.g. based on the argument? I mean, how can I read a html tag (ex. <h1>) and put its content into an index field? is it possible?
  • Commented on 08-04-2010 at 7:11 AM
    1) As far as I know, Solr does get some context into what it indexes, so it should treat TITLE as more important. To be honest, I don't have proof of this.

    2) You can use categorization when you index data. If you are indexing files, it means you have to switch to a more manual process, but you can do it. The cfindex tag supports the category and categorytree arguments. You also have 4 custom fields.
  • Fabio #
    Commented on 08-04-2010 at 7:39 AM
    Thanks a lot for your ultra-fast answer Raymond.
    I knew that categorization would let me achieve my goal but... no reference on how to do it actually. I mean how can I assign a document to the 'red' category and another to the 'blue' one?
    To be honest, here's what I'm tryiing to realize:
    my .htm doc contains <title>What color are your eyes?</title><body>They are blue.</body> . I want to search "are your eyes blue?" or "tell me your eyes' color" and get that as a result. Solr doesn't seem to get the more relevant word 'eyes'.. It highlights 'what color' or 'me your' .. wtf?!? Also, I am working in italian language (cfcollection cfindex cfsearch using language='italian').
    Indeed, for categorizing, do I have to put some category tag (e.g. <h1>Blue</h1>) and tell solr CUSTOM1="h1" ?
    Feel free to send me an e-mail if you can help me, please.
    Thanks in advance!
  • Commented on 08-04-2010 at 8:25 AM
    The docs do discuss this:

    However, in terms of file based indexing the category/categoryTree you use is assigned to every thing you index. In order to apply a unique value to each file, you would need to a) decide on your business logic (ie, WHAT cat goes with what file) and b) index one file at a time.
  • Fernando #
    Commented on 08-25-2010 at 12:21 PM
    You mentioned in a previous post that Solr would "supposedly" treat TITLE as more important in an HTML doc collection. What about if I want to give preference to certain DB table fields? For example, I create a query to index with title, description, actors, and themes fields, but I want the title to be the most impotant (and then description, actors, and themes in that order or importance, if possible). Is this something that Solr does implicitly by me giving the fields the correct "mapping names" that Solr understands so as to be able to give preference? Or is there some manual way to do this via some configuration attributes or some xml file somewhere?

    Thank you.
  • Commented on 08-26-2010 at 6:45 AM
    It looks like you can "boost"
  • Fernando #
    Commented on 08-30-2010 at 11:44 AM
    That did the trick. Thank you! Next time I'll be a little more diligent and try to look for the actual product's docs as opposed to limiting my search to Adobe docs. :D
    By using the "custom" attributes for my DB fields (custom1="title" custom2="description" etc), I was able to "boost" title in the criteria as follows:
    criteria="custom1:#searchStr#^2 custom2:#searchStr#"...

    Thank you once again!
  • Commented on 08-30-2010 at 5:30 PM
    No problem. Solr is something I really need to spend more time with. It is incredibly powerful and I'm overjoyed (can I say that?) with Adobe adding it to ColdFusion.
  • geekatwork #
    Commented on 09-01-2010 at 8:18 PM
    Has any one had any luck in converting <CONTAINS> code from Verity to Solr. Whenever I search for a substring in a string it returns all matching parts of the substring. e.g. Looking for "hello world" in "always use hello world the first time" returns matches for hello and for world.
    I can get it to work by using +custom1:(+hello +world) but that seems wrong, I would have thought +custom1:"hello world" would work.
  • geekatwork #
    Commented on 09-01-2010 at 9:02 PM
    Re: converting <CONTAINS> code from Verity to Solr
    Use custom1:"hello world"~1000000 , I nearly had it but I'd left the 1000000 (slop?) of of it when testing.
  • Fernando #
    Commented on 09-15-2010 at 11:41 AM
    Has anyone had any issues when modifying schema.xml (in either the collection folder or the actual solr template folder) and then restarting the solr service? When I do it, none of the collections show up and I get an error when trying to add new ones (The logs haven't proved to be very useful so far...). I then have to manually remove the custom collections folders and their respective entries in solr.xml, and replace the schema.xml with the original. Once I do this and I restart the solr service, I can then add new collections again.
    This obviously will not work for a production environment since anytime the server restarts, searching will not work in any applications that have implemented it.
  • Fernando #
    Commented on 09-27-2010 at 4:49 PM
    Anybody have a similar issue?
  • Fernando #
    Commented on 09-28-2010 at 11:24 AM
    It seems one cannot have multiple tokenizers per fieldType in the schema.xml
  • sean hogge #
    Commented on 10-14-2010 at 4:33 PM
    I have a couple of strange things going on with Solr that I can't find a solution for anywhere. I'm convinced it's because I'm missing something obvious, so hopefully it's a simple matter to point out where I'm wrong.

    ColdFusion 9.0.1 Standard on Linux - web root is /www, but web sites are hosted in /www/

    When I index /www/, and search the index, I get hits from /www/CFIDE. I have no symlink, only the mapping in the CF administrator.

    My "common sense" tells me Solr shouldn't index anything not physically present in the recursed directories. Is there some known setting that I need to flip to prevent this behavior, or am I completely misunderstanding the issue here? I swear I'm not a complete idiot. Mostly.
  • sean hogge #
    Commented on 10-14-2010 at 4:43 PM
    Well, it looks like purging and re-indexing seems to have remedied that issue. And it appears to be non-reproducible.

    But since I've gone and revived this old thread, I'd love to hear any recommendations for a good solution to search CFML content. Is there a site spider plugin that might integrate with Solr somehow?
  • Mike #
    Commented on 10-15-2010 at 2:10 PM
    Can you please show me the whole statement for the search?
    I need to boost scoring for title and by using this snippet:
    criteria="custom1:#searchStr#^2 custom2:#searchStr#".
    (from above) the search doesn't return anything.

    This is my statement:
    <cfindex collection="myCol" action="update" body="title,description" custom1="description" title="title" key="ID" query="myQ">
    <cfsearch collection="myCol" criteria="title:#searchStr#^10 custom1:#searchStr#^5" name="searchResult" >

  • Commented on 10-18-2010 at 6:19 AM
    If you only boost title, does it work?
  • Mike #
    Commented on 10-18-2010 at 8:12 AM
    No it doesn't. Not sure what's going on but whenever I'm adding title: to the search criteria it seems that it's using the whole thing for the search and as a result it doesn't find anything.
    I searched all over the place and can't find a reason for it.

    Very strange.
    Can somebody test and see if they can use title: in the search criteria to boost the score?

  • Commented on 10-19-2010 at 8:49 AM
    Raymond Speaks!
    Have you tested with the Solr search index with more than 10,000 accessions of contents and 5000 hits a day in it?
  • Commented on 10-19-2010 at 8:58 AM
    Nope, I do not have anything in production yet with it. I'm considering perhaps using it at CFLib as a good test.

    I'm presenting on SOLR at RIAUnleashed, and will have a small sample app then. I can try running JMeter against the site and see how it holds up. But to be honest, 10K hits in a day isn't a whole heck of a lot.
  • Commented on 10-19-2010 at 1:42 PM
    I have a CF9 site that I am trying to use SOLR on to index PDF files. I am able to do cfm and txt files, but not PDF. Any reason as to why. I also built a Verity collection, which does find the PDFs.

  • Commented on 10-20-2010 at 10:57 AM
    David, there were bugs in PDF indexing fixed in 901/CHF. So ensure you have BOTH 901 installed AND the cumulative hot fix.
  • Commented on 10-20-2010 at 11:26 AM
    I applied the HF and reindexed. Still no PDFs.
    Destroyed the collection and rebuilt. Reindexed. Still no PDFs.

    Thanks for the help.
  • Commented on 10-21-2010 at 7:19 AM
    Can you say how you did the index? Was it via the cf admin or was it via cfindex.
  • Commented on 10-21-2010 at 9:39 AM

    The collection was created from the website via the CF ADMIN pages.

    The index was created from a webpage using CFINDEX. I will post the code below, just in case...

    <CFSET IndexCollection = "psolr">
    <CFSET IndexDirectory = "d:\prelude-printed\docs">
    <CFSET IndexRecurse = "YES">
    <CFSET IndexExtensions = ".*">
    <CFSET IndexLanguage = "english">


    I have contact Adobe support in regards to this. Waiting to hear from them. The server is a new install of W2K3, CF9 with the update and CHF applied. No other application are on the server, other than IIS.
  • Commented on 10-21-2010 at 9:42 AM
    Ok, so to be clear, you run this and the PDFs are not added. Add a status attribute to your cfindex tag. This returns a structure that tells you how much stuff was added, updated, and deleted. I'd also check your extensions value. Maybe specifically use *.pdf just to see if it makes a difference.
  • Commented on 10-21-2010 at 11:07 AM

    Thanks to your suggestions, I thought it was very weird that it was not indexing the PDFs that Verity has previously indexed (different collection). I threw in the status attribute as you suggested. I was able to see the txt and cfm files that I had also added to that directory to be indexed. Those all indexed just fine, but none of the PDFs in same directory.

    I then decided to put in some other documents (XLS, BMP, DOC, etc...) into the directory to see if the indexing was going to work. It did!

    I then found some "other" PDFs, put them in the directory, and they indexed!

    Background: The PDFs are created by a system called DCS, which we use for invoice printing for our ERP system. It is odd to me that Verity can index these PDFs while Solr cannot. I took a look at the PDF, it is compatible with 5.x and greater. There must be something with these PDFs that Solr does not like.
  • Commented on 10-21-2010 at 11:11 AM
    Odd -well - at least you got part of the way. :) I'd file a bug w/ Adobe on your DCS PDFs.

    Also - you may want to try using CFPDF to read the text from them. If you can, you can index them manually.
  • Commented on 10-21-2010 at 11:51 AM
    Been contacted by Adobe in regards to this and providing examples to them for further review. I will let you know how it progresses.
  • Commented on 10-28-2010 at 1:10 PM
    I have a bug logged with Adobe.

    It seems that a PDF is not a PDF. Adobe called the PDF corrupt, even though it can be indexed by Verity, open/modified by Acrobat, and modified by CFPDF.

    We are going to create a PDF from a PDF (using the CFPDF tag) and then letting Solr index that PDF. We tested the process and it works!

    Thanks to all that helped in this.
  • Jerry #
    Commented on 04-13-2011 at 10:18 AM
    Could you address how to set up file collections when the target files reside on a server other than my webserver? I have installed SOLR service from the CF9 disk on the target FileServer and can see it running there but I cannot get files added to a SOLR collection in the Administrator on the webserver. I think I'm missing something really obvious here. I've looked for this answer everywhere, including your collaborative book(s) on CF9. Your thoughts would be most appreciated. CF9, W2003, IIS
  • Commented on 04-13-2011 at 10:27 AM
    Jerry -

    I'm sure the problem is you have it locked down so solr is only accessible to localhost. You need to open it up to allow access from your web server. You might need to do this both at the solr level, and at the machine or network firewall level, depending on your setup.

    For solr, you basically have to do the opposite of this tech article, either allowing from any IP, or just from your CF server's IP:
  • Jerry #
    Commented on 04-13-2011 at 5:19 PM
    Shannon, thanks for the idea. I determined that those files (on both servers) are in the default configuration --that of Not locked down. I tried mixing it up as best I could, as well. I think my issue is much more fundamental. I gotta start over from the beginning.
    Thanks again
  • conor #
    Commented on 05-23-2011 at 5:51 AM

    I am new to coldfusion and I am reading up on how to use solr. I am reading throught your solr example on and I have downloaded the zip. However, when I launched it locally, I get the error: 'Datasource pressreleases could not be found.' Casn you tell me what I must do for this?

    Thanks for you help!
  • Commented on 05-23-2011 at 6:52 AM
    Create a datasource called pressreleases. Point it to a MySQL db (an empty one). Any dbtype should work actually.
  • Mike #
    Commented on 08-30-2011 at 10:28 AM
    Can somebody show me how to search in the custom fields.
    <cfsearch collection="mycollection" criteria="#searchString# custom2:#searchString#" name="q" status="r" suggestions="always" contextPassages="0">

    I get a big fat error:

    here was a problem while attempting to perform a search.
    Error executing query : orgapachelucenequeryParserParseException_Cannot_parse_custom_Encountered_EOF_at_line_1_column_7__Was_expecting_one_of____________________QUOTED_______TERM_______PREFIXTERM_______WILDTERM_____________________NUMBER_______

  • Commented on 08-30-2011 at 3:25 PM
    Do you get it for all values of searchString?
  • Mike #
    Commented on 08-30-2011 at 6:01 PM
    Yes, and basically the value I'm searching exists in the query.
    I thought I have the syntax wrong, but didn't find anywhere a different way of doing it.

    BTW I'm on CF 9.01.

    Thanks Ray for your quick response
  • Commented on 08-31-2011 at 3:49 PM
    That's the right syntax for searching against a custom field so I'm not sure what to suggest. If you want to ping me off the blog I can maybe dig a bit deeper.
  • Mike #
    Commented on 09-01-2011 at 8:45 AM
    I kind of fix it by adding the custom filed to the body.

    Not the ideal solution but it's working. I will try on a different server and let you know.

    eagerly waiting for CF10. I hope solr will finally have all the features enabled and working.

    Thanks again
  • Commented on 09-01-2011 at 9:30 AM
    Solr does have all the features. CFSEARCH does not. :) Remember you can hit Solr directly via HTTP and get the results back. CFSEARCH as a wrapper can't cover every feature.

    I can say that in my CF/Solr preso next week I'm revealing one of the new things in CF10. You should attend. :)
  • Mike #
    Commented on 09-01-2011 at 12:51 PM
    you are right I meant cfsearch not solr.

    Is it on the cfmeetup or something else?
    Please provide URL if it's available.

  • Commented on 09-01-2011 at 1:14 PM
    Here are the details.
  • Mike #
    Commented on 09-01-2011 at 1:32 PM
    Ray, will there be a recording of this event? Because it's mid day, I don't think I can attend but I will very much like to see it.

    Also, I really hope that some advanced features are shown including the ability to access solr directly as you mention in the previous message.

  • Commented on 09-01-2011 at 1:39 PM
    There will be a recording, but I'm not going that deep. I just mention it's possible. Remember there is a web based interface to Solr. By default you can hit it here - http://localhost:8983/solr/

    Basically you just need to make HTTP requests with the right url parameters. Check the Solr docs for information on that.
  • sdtacoma #
    Commented on 09-01-2011 at 5:14 PM
    Is there a way to cfdump or view the contents of a collection some way? I am trying to update a collection and want to see if my update is in the collection or not. Searching the collection isn't returning any hits.
  • Commented on 09-01-2011 at 8:11 PM
    If you search for nothing, it will return everything.

    If you want to search for a specific item, remember you can search against the key field. I'm not having luck now using a file based key, but I'm pretty sure it does work.
  • Mike #
    Commented on 09-08-2011 at 1:08 PM

    Did you have a chance to see if ranking works in CF implementation.
    See my previous post (41).

  • Commented on 09-08-2011 at 2:35 PM
    It seems to work for me. I tried this search against a collection created on cfdocs:

    cffeed title:cfthread

    Which means: cffeed in the body or cfthread in the title.

    I then did

    cffeed title:cfthread^100

    And the result with cfthread in the title popped to the top rank wise.
  • Mike #
    Commented on 09-08-2011 at 2:50 PM
    Can you please show the whole index and search statement?
    Maybe I'm doing something wrong, even though if I would have it wrong (from a syntax point of view) I would expect an error, But I get no results.
    If I do a normal search without ranking I get results

    Thx for the quick replay
  • Commented on 09-08-2011 at 2:58 PM
    I did post the entire search statement above. The index is an index of the CFDocs that ship with ColdFusion. I created that in the admin.
  • Mike #
    Commented on 11-26-2011 at 10:08 AM
    Hi Ray,

    Unfortunately it's me again, For the life of me I can't get how to boost certain fields so they show at the top.

    CF version: 9,0,1,274733
    Update Level    chf9010002.jar

    I'm using the following code:
    <cfquery name="getBooks">
       select bookID, title, url, datePublished, author, description
       from books

    <cfindex collection="books" action="update" body="description"
        custom1="datePublished" custom2="url"
        custom3="author" title="title"
        type="custom" key="bookID" query="getBooks">
       <cfsearch collection="books" criteria="title:#searchCriteria#^10" name="results" status="r" suggestions="always"
              ContextBytes="1000" ContextPassages="4">

    When I run this I get 0 results.

    If I removed the title from the criteria:
    <cfsearch collection="books" criteria="#searchCriteria#" name="results" status="r" suggestions="always"
              ContextBytes="1000" ContextPassages="4">

    I get all the hits but entries without the search query in the title have higher ranking then the ones with the criteria.

    Is this the correct syntax? do I have to make changes to the solrconfig.xml file?

    Based on all I have read it should work, but it doesn't (for me).

    Please help.

  • Commented on 11-26-2011 at 1:23 PM
    Odd. I tested it on an index of cfdocs, and it seems to work ok.
  • Mike #
    Commented on 11-26-2011 at 6:23 PM
    and that's what's killing me. It should work but it doesn't. I have tried on 3 different computers with standard CF9 installation.

    Is your installation standard?
    Do I have to update the installation of solr?
    Can you please send me your solr confing xml file.

    I have read that I can play with that.

    Thanks Ray for the quick response on a Saturday.

  • Commented on 11-26-2011 at 6:27 PM
    901+latest CHR.
    My solr config wasn't modified. I just made an index of the cfdocs that ship with CF. If you make one it should be the same.
  • Mike #
    Commented on 11-26-2011 at 9:22 PM
    No luck, I have tried indexing documents instead of a query and same result.

    If it's not too much to ask can you please (when you have some time) post the exact statements to create the collection, index and search with boost.

    Definitely I am missing something.

    Thanks again
  • Commented on 11-26-2011 at 10:33 PM
    Given a file index of cfdocs, this is what I saw:

    I searched for rss and number one was: Adobe ColdFusion * cffeed

    I then did

    rss OR title:RSS

    and then an item with RSS in the title went to the top. When I boosted the score by 10, it stayed at number one, but oddly the score went down by 10. So... um... not sure.
  • Mike #
    Commented on 11-27-2011 at 9:17 AM
    Don't know what to say. It just doesn't work. By adding criteria="#searchCriteria# or title:#searchCriteria#" the order actually changes but not for good. Actually the first 3 returns have no mention of the searchCriteria in the title, while the last item has. I was hoping that by looking at your exact code I can see something that I'm missing...Maybe the boost has to be done when indexing the collection

    and have a great week-end
  • Commented on 11-27-2011 at 1:42 PM
    Well, it makes sense that the first 3 may not have it in the title. You said, "X or title:X". Did you try "X OR title:X^10"
  • Mike #
    Commented on 11-27-2011 at 3:08 PM
    Yes, I have tried lower and upper case, with or without boosting the title, out of desperation change even the order.

    And while the order changed there were more entries at the top without the search word in the title.

  • Commented on 11-27-2011 at 3:14 PM
    Any chance you can share a zip of your data?