CF101: Splitting a string into parts using ColdFusion

This post is more than 2 years old.

Stveve Wonderpets asked:

As I've seen many ways to try and do this, I'm coming to you to see how, and the fastest executing way to find the "" and put each paragraph into a struct or array?

I have to be honest. When I first responded to Steve, I completely spaced on the fact that he had a few alternatives already, and he was simply looking for the "best" way. I provided a few solutions myself and I thought I'd share them with my readers.

I began by taking his source data and simply creating a variable out of it.

<cfsavecontent variable="text"> - Hide quoted text - Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. <!-- pagebreak --> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. <!-- pagebreak --> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. </cfsavecontent>

For the first method I suggested, I simply made use of the fact that the ColdFusion variable created above can be used like a Java String object. This means we have access to methods like split. So I first shared this:

<cfset p = text.split("<!-- pagebreak -->")> <cfdump var="#p#">

It doesn't get much easier than that. It is important to remember that the value pased to Spit should be a regex. If you wanted to match on "." for some reason you would (I assume - didn't test it!) need to escape it.

If using Java worries you for some reason, you can also make use of a CFLib UDF, Split. Despite being many more lines of CFML, in Steve's testing it ran just as fast. (Although I'd probably assume the Java method would be faster on larger strings.)

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Shaun McCran posted on 6/29/2010 at 3:43 PM

I prefer the split() method, I used to loop over the variable, using coldfusions list functions to match the a sub string, but once I found out about split() its much easier, and quicker.

Comment 2 by Joel Cox posted on 6/29/2010 at 5:18 PM

ListToArray will do this also:

<cfset p = ListToArray(text, " ") />

Comment 3 by Raymond Camden posted on 6/29/2010 at 5:21 PM

Not exactly. If you have a single char delimiter, sure. But if your delimiter is multichar, then no. The func will treat each char as a unique delimiter.

Comment 4 by Raymond Camden posted on 6/29/2010 at 5:23 PM

I should add though - listToArray DOES support multichar delims. It's just not the default. Good catch Joel!

Comment 5 by Raymond Camden posted on 6/29/2010 at 5:24 PM

Just to be anal and complete - that was modded in CF9.

Comment 6 by Steve posted on 6/29/2010 at 6:01 PM

This example using ListToArray consistently shaves 4-5 ms off the split/udf methods on my localhost with debugging turned on. Not the best way to do a performance test, but it might be the fastest. A better test would be to use much more text.

As Ray mentioned, on huge amounts of text where the delimiter is present many times the Java split might be the winner.

Comment 7 by Wookie posted on 6/29/2010 at 6:03 PM

Is it Split or Spit? :)

It is important to remember that the value pased to Spit should be a regex.

Comment 8 by Raymond Camden posted on 6/29/2010 at 9:20 PM

Spit! ;) On a serious note - I did mention the regex-ness of it. Something to keep in mind.

Comment 9 by Phil Munro posted on 5/12/2011 at 2:37 PM

I ran a few tests to compare using "split()" with "listToArray()" when splitting a large string into parts as its something that's going to used again and again in my application.

The source string was 11000 lines of html, and I timed how long it took to do 150000 loops over the same split.

Both took exactly the same amount of time - 49 seconds.

Comment 10 by Don Vawter posted on 2/27/2012 at 6:32 AM

st.split("") should on st="ABC" give you [A,B,C] but in fact it gives you [chr(0),A,B,C]
in js you get [A,B,C] for st.split("") and ["","","",""] for st.split(/./) so be careful if trying to split a string into characters because the first element will be null.

Comment 11 by Sandeep posted on 6/24/2013 at 9:17 AM

Hi, when i am trying to put a string into a array delimited by "|". It is putting a single character into an array element not the content delimited by "|".
Any help?

Comment 12 by Raymond Camden posted on 6/24/2013 at 2:09 PM

Show us the code you used please - use Pastebin or a Gist.

Comment 13 by victor Diaz posted on 11/3/2013 at 12:30 AM

This may seem off topic but I will bring it together at the end:
I am developing a lottery website in cold fusion that outputs combination of 5 numbers for a 5/39 lottery game that adds up to a sum.
Ex. Select sum: 97 with 3 odd and 2 even numbers:
Output:
2 15 20 25 35
7 10 19 29 32
5 9 20 24 39
6 8 15 33 35
5 8 21 27 36
1 12 22 25 37
2 3 21 35 36
3 12 24 25 33
15 16 19 21 26
5 16 20 27 29
9 15 18 27 28
2 12 13 33 37
6 9 18 31 33
3 18 21 26 29
10 15 20 23 29
4 17 18 27 31
7 16 22 23 29
3 16 21 27 30
6 15 21 22 33
12 13 17 26 29

A database with tables organized by sum supplies the results and I use modulo to filter the odd and even results. Ex. num1 + num2 + ... = 0, EVEN | num1 + num2 ... = 1 ODD

My problem is that - now I want a filter that displays only numbers that contain - [0,1,2,3 ] in the result.
Choosing from the above sample list of numbers that equal the sum of 37 - this occurs in these cases:

7 10 19 29 32
10 15 20 23 29
3 16 21 27 30
10 15 20 23 29

Each string of 5 space delimited numbers contain a number from this list [0,1,2,3].

(7 is not in the list but the overall string contains all of the requisite numbers from the list of 0,1,2,3 so the string result still qualifies.)
Ex.
10 - contains a 0 and a 1
19 contains a 1
29 contains a 2
32 contains a I and a 2
etc...

So stating the problem:
Capture the query results in a structure that will allow me to check each string of numbers
for a minimum set of characters in the string and return those delimited string results that contain a number from my list in their make-up [0,1,2,3]

Not sure what is the best way to go to do this?