Ask a Jedi: Best way to trim text

This post is more than 2 years old.

Sal asks:

just curious what's the best way (or how you handle) to truncate a paragraph to only show say perhaps 500 chars.? I have a newsletter that I'm emailing out, and I only wanna show 500 chars. of each article in the email.

Ah, I love it when folks ask me the "best" way to do things since no matter what I say, I'm not wrong (grin). Seriously though - here are multiple ways to trim text.

Let's first start off with a block of text that we will use for our tests:


<cfsavecontent variable="quote">
The Constitution is not an instrument for the government to restrain the people, it is an instrument for the people to restrain the government -- lest it come to dominate our lives and interests. Patrick Henry.
</cfsavecontent>

So the quickest way to trim text is with left:


<cfoutput>#left(quote,100)#</cfoutput>

However if you use this on the text, you get:

The Constitution is not an instrument for the government to restrain the people, it is an instrumen

As you can see, the last word in the trimmed text, instrument, was cut off before the final t. This isn't a horrible thing of course, but it could be done better. ColdFusion does ship with a Wrap function, but that won't crop the text, it will simply break the text into lines of a certain length. It will break the text nicely though, so why not use list functions?


<cfoutput>#listFirst(wrap(quote,100),chr(10))#</cfoutput>

This returns a nicer trim:

The Constitution is not an instrument for the government to restrain the people, it is an

This works nicely, but I kinda feel 'dirty' doing it like this, so why not see if a UDF exists for this? Turns out one does: FullLeft. This UDF lets me do this instead:


<cfoutput>#fullleft(quote,100)#</cfoutput>

In theory it's doing a lot less work than wrap so it should be quicker.

Ok, so we're done, right? Well, what if we modify the quote a bit:


<cfsavecontent variable="quote">
The <a href="http://www.raymondcamden.com">Constitution</a> is <b>not</b> an instrument for the government to restrain the people, it is an instrument for the people to restrain the government -- lest it come to dominate our lives and interests. Patrick Henry.
</cfsavecontent>

As you can see I've added some HTML to the text. This HTML messes up my count. If I wanted to show 100 characters, I don't think I'd want HTML to count at all. In fact, I probably don't want to show HTML at all. I can fix that easily enough:


<cfset quote = rereplace(quote, "<.*?>", "", "all")>

Another issue is space. Now this is a contrived example, but it could happen in a live system:


<cfsavecontent variable="quote">
The <a href="http://www.coldfusionjedi.com">Constitution</a> is <b>not</b> 










an 
instrument for the government to restrain the people, it is an instrument for 
the people to restrain the government -- lest it come to dominate our lives and interests. 

Patrick Henry.
</cfsavecontent>

You can use another regex to handle this:


<cfset quote = rereplace(quote, "[[:space:]]+", " ", "all")>

Or conversely, if you use the wrap() function, it takes a 3rd argument to strip out existing line breaks and carriage returns.

Lastly - it sometimes helps to visually flag text that has been trimmed. Normally this is done with a "...". You can mimic this affect like so:


<cfif len(quote) gt 100>
	<cfset trimmedQuote = fullLeft(quote, 100)>
	<cfset trimmedQuote &= "...">
<cfelse>
	<cfset trimmedQuote = quote>
</cfif>
<cfoutput>#trimmedQuote#</cfoutput>

I just check the length of the original quote and conditionally perform a trim and add the "...".

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Chris H posted on 5/28/2008 at 6:11 PM

nice, detailed post, might come in handy soon. thanks ray!

Comment 2 by David S posted on 5/28/2008 at 6:59 PM

Wow. I've had to do this many times but have never put that much thought into it. Thanks for the great solution.

Comment 3 by Luke posted on 5/28/2008 at 7:01 PM

could be better off trimming it at source to avoid unnecessary db traffic.

in mssql select the column with something like:

substring(yourTextyCol,1,100)

then stick your "..." after it

Comment 4 by Raymond Camden posted on 5/28/2008 at 7:06 PM

@Luke - Well this suffers the same problem as Left() does. However, you do have a point - it may make sense to do the 'nice left' once and store the result.

Comment 5 by sal posted on 5/28/2008 at 7:17 PM

thanks yo!

;-)

Comment 6 by John Whish posted on 5/28/2008 at 8:43 PM

I never thought of using the Wrap tag to do this - that's neat. My custom function uses Find and Left:

plaintext = ReReplaceNoCase(htmltext, "<[^>]+>"), " ", "all");
Return Left(plaintext , Find(" ", plaintext, 100)) & "&hellip;";

Comment 7 by Doug posted on 5/28/2008 at 10:55 PM

Another thing to keep in mind is HTML entities. You may wish to convert the entities to ASCII text to get a better character or word count. So if your user entered something like this:

Using characters like “é”, “ü”, “etc”. is ok.

...would be converted to this...

Using characters like “é”, “ü”, “etc”. is ok.

I ran into this problem awhile back and created a nice little JavaScript function to do this, but it could easily be done in ColdFusion as well.

Comment 8 by Doug posted on 5/28/2008 at 10:56 PM

oops, that first line got converted. It should have been:

Using characters like &ldquo;&eacute;&rdquo;, &ldquo;&uuml;&rdquo;, &ldquo;etc&rdquo;. is ok.

Comment 9 by Raymond Camden posted on 5/28/2008 at 11:01 PM

Excellent point Doug. It may even be worthwhile to just delete them. Now that may result in some odd misspellings - but it may be the simplest solution.

Comment 10 by Chris H posted on 5/28/2008 at 11:29 PM

yeah, if you dont have to worry about HTML entities, special characters etc., you could do this in MySQL via
SELECT CONCAT( LEFT( TextToSelect, 500 ), '...' ) FROM Blah

Comment 11 by Joshua Curtiss posted on 5/28/2008 at 11:42 PM

Awesome post, with consideration of the HTML. Nice.

Comment 12 by Mikkel Johansen posted on 5/29/2008 at 9:21 AM

What if the text contains HTML-tags like <b>, <i>, <a> etc.

I have had trouble wrapping text containing these kind of tags. The problem is when it cuts the text between a start tag and an end tag.

Comment 13 by Raymond Camden posted on 5/29/2008 at 2:41 PM

@Mikkel: Um.... you did read the blog entry, right? I cover HTML.

Comment 14 by Mikkel Johansen posted on 5/29/2008 at 3:08 PM

@Ray: I did read the part where you replace any tag with "blank".

My "question" should have been: What if I want to keep the html-tags without breaking the start/end-tag when wrapping the text.

Comment 15 by Raymond Camden posted on 5/29/2008 at 3:15 PM

Ah - that gets significantly more complex perhaps. You could do this:

1) Remove html
2) Find FullLeft(N)
3) If fullLeft(n) ends at "the", go back to original content (with html), find "the", and end there.

That would let you keep the html and wrap at text not including html, but the N value would be <N as you didn't count the html. Another issue is that it wouldn't stop you from ending with <b>the and having an unmatched tag.

You could write code to determine if your fullleft(n) result is inside HTML. This is done by looking for <X> </X> around your result. If you find it, you either move to the end of </x> or go to before <x>.

Comment 16 by Doug posted on 5/29/2008 at 11:22 PM

@Mikkel: My "question" should have been: What if I want to keep the html-tags without breaking the start/end-tag when wrapping the text.

You would almost need to create some sort of HTML parser for that. Have you ever looked at the HTML source for a ColdFusion error message? If you notice, it adds a bunch of close tags (</b></p></td></tr></table>...) before it adds the Error message source. It's not calculating those tags, it's just adding a bunch of them to be safe and they don't always work.

Most likely you could create a Regular Express to find all the <BLOCK> tags, and if any of them were still open, you could add their closing tags to the end. I think that would be crazy complicated and would have to ask if it's worth it.

Comment 17 by anthony posted on 6/3/2008 at 1:23 AM

I used this solution that Ben Nadel came up with to close truncated html. It does a pretty good job.

http://www.bennadel.com/blo...

Comment 18 by Duane Hardy posted on 6/11/2008 at 5:49 PM

Is it possible to trim all text around a tag. For instance trim all the text in your example before or after "<a href="http://www.coldfusionjedi.com">Constitution</a>" ?

Comment 19 by Raymond Camden posted on 6/11/2008 at 5:50 PM

In theory. You would write a regex to match

(1 or more spaces)(link including closing a tag)(1 or more spaces)

and replace with

(link)

Let me gtive it a try.

Comment 20 by Raymond Camden posted on 6/11/2008 at 5:53 PM

Not heavily tested, but this seems to work. I assumed you meant replace two or more with one:

<cfset text = rereplacenocase(text, "[[:space:]]+(<a.*?>.*?</a>)[[:space:]]+"," \1 ")>

If you really want NO space, period, just change the 3rd arg to be just \1, not (space)\1(space).

Comment 21 by Duane Hardy posted on 6/11/2008 at 6:01 PM

What I am ultimately trying to do is add 'target="_blank"' to an a tag with an external href. I was looking at trying to trim a string provided by a webservice down to just the <a> tag and using javascript for all external links. Possibly I could do add 'target="blank"' with coldfusion? Do you know any methods?

Comment 22 by Raymond Camden posted on 6/11/2008 at 6:09 PM

Oh thats simpler. You can't do it (afaik) in one line, but just get all the links (use reMatch in cf8) and then replace any non-local link with the modified version.

Comment 23 by John Whish posted on 6/12/2008 at 12:21 AM

I know it's not ColdFusion, but you could use jQuery to do this for you quite easily (assuming all external links start with http):

$('a[@href^="http://"]').attr("target", "_blank");

Or if you want to get fancy:

$('a[@href^="http://"]').attr({target: "_blank", title: "Opens in a new window"});

Hope that's of interest.

Comment 24 by Raymond Camden posted on 6/12/2008 at 12:38 AM

@John: Very much of interest. To quote the great Paris: "That's hot."

Comment 25 by John Whish posted on 6/12/2008 at 12:46 AM

It's nice to teach you something Raymond after all I've learnt from you :)

Comment 26 by John Whish posted on 6/12/2008 at 12:54 AM

Just noticed my comment didn't come out right. There shouldn't be a semi-colon after the http. I'll try posting again in case it was my typo!

$('a[@href^="http://"]').attr({target: "_blank", title: "Opens in a new window"});

Comment 27 by John Whish posted on 6/12/2008 at 1:16 AM

I've posted the code here, (with a bonus feature!) if anyone's interested
http://www.aliaspooryorik.c...

:)

Comment 28 by Duane Hardy posted on 6/12/2008 at 5:50 PM

My front end is a flex application. I thought if I did the modification on the backend before the links got called that it would save time and coding on the front end.

I assume I would have to have an ExternalInterface in the flex actionscript to communicate with the jQuery code? It would be great if it would automatically detect and append the code.

I do have to append a user code to the end of each external link, so I am interested in how jQuery works. I haven't started this part of the project yet, where is the best source for this resource?

Thanks for your help.

Comment 29 by Nick W posted on 8/25/2014 at 8:42 PM

Nice one Raymond. I often find myself on your website having googled 'coldfusion #whatever-i'm-stuck-on-at-the-time#' and generally i've got a head full of code so once ive got the solution i need i'm back to sublime text to implement it and crack on (without commenting) - but i just wanted to say 'thank you' as your posts have been really helpful over the years on quite a few occasions - thank you.

Comment 30 by Raymond Camden posted on 8/25/2014 at 8:43 PM

You are most welcome!

Comment 31 by dilbert posted on 3/7/2016 at 11:21 PM

I know this one is "ancient", but I just stumbled upon it and found it to fit exactly what I needed. Thank you

Comment 32 (In reply to #31) by Raymond Camden posted on 3/8/2016 at 11:24 AM

You are welcome. I cleaned up the code samples.

Comment 33 (In reply to #32) by dilbert posted on 3/8/2016 at 2:25 PM

Thanks again, I just wanted to let you know that this "old" information is still helping people.

The link for the UDF above is broken, I found it at http://cflib.org/udf/FullLeft.