UDF to crop and highlight a block of text

This post is more than 2 years old.

Here is a little UDF I worked on this morning. I've had code like this in BlogCFC for a while, but I needed it in a UDF for my Picard project so I just whipped it out. The basic idea is:

You have a block of text of arbitrary text.
You had searched for something and that something is probably in the text. (I say probably because the search may have matched on another part of the content in question, like the title.)
You want to highlight the match in the content.
You also want to crop the content to X characters long, and if a match was found, center the X characters around the first match.

Make sense? So given a block of text, like the lyrics to Lady Gaga's "Poker Face" (don't ask), I can find/highlight the word poker like so:


Where text is a variable containing the lyrics, poker is the word to highlight, 250 is the size of the result (which is a bit fuzzy, will explain why in a bit), and the final argument is the "wrap" to use around the result. Here is what the UDF will return:

... oh, oh, ohhhh, oh-oh-e-oh-oh-oh,
I'll get him hot, show him what I've got

Can't read my,
Can't read my
No he can't read my poker face
(she's got me like nobody)
Can't read my
Can't read my
No he can't read my poker face
(she's got me like nobody)


So you get the basic idea. Here is the UDF (as it stands now, but there are parts of it I'd like to improve):

<cffunction name="highlightAndCrop" access="public" output="false" hint="Given an arbitrary string and a search term, find it, and return a 'cropped' set of text around the match."> <cfargument name="string" type="string" required="true" hint="Main blob of text"> <cfargument name="term" type="string" required="true" hint="Keyword to look for."> <cfargument name="size" type="numeric" required="false" hint="Size of result string. Defaults to total size of string. Note this is a bit fuzzy - we split it in two and return that amount before and after the match. The size of term and wrap will therefore impact total string length."> <cfargument name="wrap" type="string" required="false" default="<b></b>" hint="HTML to wrap the match. MUST be one pair of HTML tags.">
&lt;cfset var excerpt = ""&gt;

&lt;!--- clean the string ---&gt;
&lt;cfset arguments.string = trim(rereplace(arguments.string, "&lt;.*?&gt;", "", "all"))&gt;

&lt;!--- pad is half our total ---&gt;
&lt;cfif not structKeyExists(arguments, "size")&gt;
	&lt;cfset arguments.size = len(arguments.string)&gt;
&lt;cfset var pad = ceiling(arguments.size/2)&gt;

&lt;cfset var match = findNoCase(arguments.term, arguments.string)&gt;
&lt;cfif match lte pad&gt;
	&lt;cfset match = 1&gt;
&lt;cfset var end = match + len(arguments.term) + arguments.size&gt;

&lt;!--- now create the main string around the match ---&gt;
&lt;cfif len(arguments.string) gt arguments.size&gt;
	&lt;cfif match gt 1&gt;
		&lt;cfset excerpt = "..." & mid(arguments.string, match-pad, end-match)&gt;
		&lt;cfset excerpt = left(arguments.string,end)&gt;
	&lt;cfif len(arguments.string) gt end&gt;
		&lt;cfset excerpt = excerpt & "..."&gt;
	&lt;cfset excerpt = arguments.string&gt;

&lt;!--- split up my wrap - I bet this can be done better... ---&gt;
&lt;cfset var endInitialTag = find("&gt;",arguments.wrap)&gt;
&lt;cfset var beginTag = left(arguments.wrap, endInitialTag)&gt;
&lt;cfset var endTag = mid(arguments.wrap, endInitialTag+1, len(arguments.wrap))&gt;

&lt;cfset excerpt = reReplaceNoCase(excerpt, "(#arguments.term#)", "#beginTag#\1#endTag#","all")&gt;

&lt;cfreturn excerpt&gt;


For the most part this should make sense. I attempt to find the term within the string and use that as a base to create an excerpt. I handle cases where the match isn't found and I also handle cases where the total string is smaller than the crop. Note that the wrap HTML you include will have an impact on the total length of the string, but that shouldn't matter.

The main part I don't like is the wrap portion. It only supports one set of tags. I may split this into two arguments, a beginWrap and endWrap. For now though it suits my purposes.

p.s. This UDF is ColdFusion 9 only because of the var statements intermingled within the UDF. To use this in earlier versions, simply move the var statements to the beginning of the UDF.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by shag posted on 12/24/2009 at 3:16 AM

just curious, why wouldn't you use one of the many jquery plugins out there to do this?

Comment 2 by Chuck Savage posted on 12/24/2009 at 3:19 AM

I'd say you should go with the two parts to begin and end of markup, for a couple of reasons. You have hard coded in the > to determine the end of the first tag, then you couldn't use this in say you WIKI where you use '''bold statement''' for bold. Also, with two you could add colors or any other markup you wanted around the text, make it blink, whatever.

Otherwise cool idea.

ps I think your wiki bold is three ', it may be 2, one is italics other is bold... I forget.

Comment 3 by Raymond Camden posted on 12/24/2009 at 3:22 AM

@shag: On a page with a lot of text, I'd rather do the cutting client side. As it stands, the initial string could be pretty large, multiply that by 10 results per page, and it's a lot of stuff to go back and forth. :) That being said, I _did_ think about also writing a jQuery version just to see how difficult it would be.

@chuck: Yeah, good point there.