Here is a simple question. Given the "typical" search scenario (user searches for X, your SQL searches for "like %x%"), how do you broaden the user's input when multiple words are used? Let me begin with an example of what I mean and then I'll demonstrate a few solutions. Also, note that in this blog entry I'm focusing only on SQL searching, not free text searching ala Lucene (or full text searching like you get in some databases).

I began with an incredibly simple search form tied to the cfartgallery sample database.

<cfparam name="form.search" default="">

<form method="post"> <cfoutput> <input type="text" name="search" value="#form.search#"> <input type="submit" value="Search"> </cfoutput> </form>

<cfif len(trim(form.search))>

<cfset term = "%" & trim(form.search) & "%"> <cfset term = ucase(term)>

<cfquery name="results" datasource="cfartgallery"> select artname, description from art where ucase(artname) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#term#"> or ucase(description) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#term#"> </cfquery>

<cfdump var="#results#">

</cfif>

I assume nothing here is too crazy or unique. I basically take the form input, trim it, and wrap it with % characters. (Note the use of ucase here as the Derby databases were case sensitive.) I then check this input against the artname and description columns. While this works, it fails in a few cases. A good example of this involves two pieces of art: "Man in Jeans" and "Man on Stool." If I knew a piece of art existed that was named "man something something jeans", a search for "man jeans" would fail to work. My search code here is treating the input as a phrase, so it must match the literal 'man jeans' set of characters anywhere in the two columns.

One possible way to fix this would be to treat the input like a list of options and allow for a match against any of them. Let's look at that modification first.

<cfparam name="form.search" default="">

<form method="post"> <cfoutput> <input type="text" name="search" value="#form.search#"> <input type="submit" value="Search"> </cfoutput> </form>

<cfif len(trim(form.search))>

<cfquery name="results" datasource="cfartgallery"> select artname, description from art where 1=0 <cfloop index="word" list="#trim(form.search)#" delimiters=" "> <cfset word = "%" & ucase(word) & "%"> or ucase(artname) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#word#"> or ucase(description) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#word#"> </cfloop> </cfquery>

<cfdump var="#results#">

</cfif>

Notice I've modified the where clause. I begin with a 1=0 to act as a simple placeholder that will match nothing. Then for each "word" in the input I output OR clauses for each thing. Given the input "man jeans", the SQL you will end up with is:

where 1=1
or ucase(artname) like '%MAN%'
or ucase(description) like '%MAN%'
or ucase(artname) like '%JEANS%'
or ucase(description) like '%JEANS%'

This works well, but could a bit too loose. My search for "man jeans" ends up matching both "Man in Jeans" and "Man on Stool." That isn't horrible - and is better than matching nothing. But if we wanted to be a bit more strict, we can use an AND search. In this example, we will require all the words to exist, but still allow them to match in any column.

<cfparam name="form.search" default="">

<form method="post"> <cfoutput> <input type="text" name="search" value="#form.search#"> <input type="submit" value="Search"> </cfoutput> </form>

<cfif len(trim(form.search))>

<cfquery name="results" datasource="cfartgallery"> select artname, description from art where <cfloop index="x" from="1" to="#listLen(form.search, " ")#"> <cfset word = listGetAt(form.search, x, " ")> <cfset word = "%" & ucase(word) & "%"> <cfif x neq 1> and </cfif> ( ucase(artname) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#word#"> or ucase(description) like <cfqueryparam cfsqltype="cf_sql_varchar" value="#word#"> ) </cfloop> </cfquery>

<cfdump var="#results#">

</cfif>

The where clause here is different. This time I loop over each word but include the AND prefix for each set of searches. The AND prefix is used only after the first word. So given the "man jeans" example we used earlier, we end up with this clause:

where
(ucase(artname) like '%MEN%'
or ucase(description) like '%MEN%')
and (ucase(artname) like '%JEANS%'
or ucase(description) like '%JEANS%')

This provides a tighter result set than the previous example. So which is best? It depends. :) What I recommend is - log your searches and spend some time trying them yourselves. See how successful they are. Take that data and then decide what kind of modifications you should make to your search form.