After my presentation last week I had a few ColdFusion/Solr questions to follow up on. Here are two of them.
1) Can you use Solr with content indexed on Amazon S3?
Yes and no. The main answer is no. The code below is what I used to test:
<cfset s3dir = "s3://myaccess:email@example.com">
<cfdirectory directory="#s3dir#" name="files">
<cfindex action="update" collection="indextest1" type="path" key="#s3dir#"
recurse="true" status="result" extensions=".txt,.pdf">
<cfdump var="#result#" label="Result of update operation">
When run, you get: The key specified is not a directory: s3://myaccess:firstname.lastname@example.org. The path in the key attribute must be a directory when type="path". Obviously "myaccess" and "mysecret" were real values, but nonetheless, this isn't supported. I'm not terribly surprised by this ColdFusion speaks to Solr and asks it to index a folder but in this case the folder is only 'reachable' via ColdFusion. However, you can make use of S3 and Solr indexing. Whenever you move a file to S3, simply run the index operation first. Let Solr index the file and then push it off to S3.
2) Can you index a file and a db record together in the same search "row". I know SOLR can handle it if you roll the code manually, but can this be done with the CF tags?
Again - yes and no. The tag that indexes file based data and query based data (cfindex) can only do one type at a time. So with just one tag you couldn't do this. However - if you read and parse the file yourself (for example, using cfpdf to read in the text of a pdf) you can then merge that textual data with any other database data when you add it to the index. I'm not sure how useful this would be. I could see merging file data with database information being stored in the custom fields though.