Stveve Wonderpets asked:
As I've seen many ways to try and do this, I'm coming to you to see how, and the fastest executing way to find the "" and put each paragraph into a struct or array?
I have to be honest. When I first responded to Steve, I completely spaced on the fact that he had a few alternatives already, and he was simply looking for the "best" way. I provided a few solutions myself and I thought I'd share them with my readers.
I began by taking his source data and simply creating a variable out of it.
<cfsavecontent variable="text">
- Hide quoted text -
Contrary to popular belief, Lorem Ipsum is not simply random text. It
has roots in a piece of classical Latin literature from 45 BC, making
it over 2000 years old. Richard McClintock, a Latin professor at
Hampden-Sydney College in Virginia, looked up one of the more obscure
Latin words, consectetur, from a Lorem Ipsum passage, and going
through the cites of the word in classical literature, discovered the
undoubtable source. Lorem Ipsum comes from sections 1.10.32 and
1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and
Evil) by Cicero, written in 45 BC. This book is a treatise on the
theory of ethics, very popular during the Renaissance. The first line
of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in
section 1.10.32.
<!-- pagebreak -->
Lorem Ipsum is simply dummy text of the printing and typesetting
industry. Lorem Ipsum has been the industry's standard dummy text ever
since the 1500s, when an unknown printer took a galley of type and
scrambled it to make a type specimen book. It has survived not only
five centuries, but also the leap into electronic typesetting,
remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and
more recently with desktop publishing software like Aldus PageMaker
including versions of Lorem Ipsum.
<!-- pagebreak -->
Lorem Ipsum is simply dummy text of the printing and typesetting
industry. Lorem Ipsum has been the industry's standard dummy text ever
since the 1500s, when an unknown printer took a galley of type and
scrambled it to make a type specimen book. It has survived not only
five centuries, but also the leap into electronic typesetting,
remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and
more recently with desktop publishing software like Aldus PageMaker
including versions of Lorem Ipsum.
</cfsavecontent>
For the first method I suggested, I simply made use of the fact that the ColdFusion variable created above can be used like a Java String object. This means we have access to methods like split. So I first shared this:
<cfset p = text.split("<!-- pagebreak -->")>
<cfdump var="#p#">
It doesn't get much easier than that. It is important to remember that the value pased to Spit should be a regex. If you wanted to match on "." for some reason you would (I assume - didn't test it!) need to escape it.
If using Java worries you for some reason, you can also make use of a CFLib UDF, Split. Despite being many more lines of CFML, in Steve's testing it ran just as fast. (Although I'd probably assume the Java method would be faster on larger strings.)
Archived Comments
I prefer the split() method, I used to loop over the variable, using coldfusions list functions to match the a sub string, but once I found out about split() its much easier, and quicker.
ListToArray will do this also:
<cfset p = ListToArray(text, " ") />
Not exactly. If you have a single char delimiter, sure. But if your delimiter is multichar, then no. The func will treat each char as a unique delimiter.
I should add though - listToArray DOES support multichar delims. It's just not the default. Good catch Joel!
Just to be anal and complete - that was modded in CF9.
This example using ListToArray consistently shaves 4-5 ms off the split/udf methods on my localhost with debugging turned on. Not the best way to do a performance test, but it might be the fastest. A better test would be to use much more text.
As Ray mentioned, on huge amounts of text where the delimiter is present many times the Java split might be the winner.
Is it Split or Spit? :)
It is important to remember that the value pased to Spit should be a regex.
Spit! ;) On a serious note - I did mention the regex-ness of it. Something to keep in mind.
I ran a few tests to compare using "split()" with "listToArray()" when splitting a large string into parts as its something that's going to used again and again in my application.
The source string was 11000 lines of html, and I timed how long it took to do 150000 loops over the same split.
Both took exactly the same amount of time - 49 seconds.
st.split("") should on st="ABC" give you [A,B,C] but in fact it gives you [chr(0),A,B,C]
in js you get [A,B,C] for st.split("") and ["","","",""] for st.split(/./) so be careful if trying to split a string into characters because the first element will be null.
Hi, when i am trying to put a string into a array delimited by "|". It is putting a single character into an array element not the content delimited by "|".
Any help?
Show us the code you used please - use Pastebin or a Gist.
This may seem off topic but I will bring it together at the end:
I am developing a lottery website in cold fusion that outputs combination of 5 numbers for a 5/39 lottery game that adds up to a sum.
Ex. Select sum: 97 with 3 odd and 2 even numbers:
Output:
2 15 20 25 35
7 10 19 29 32
5 9 20 24 39
6 8 15 33 35
5 8 21 27 36
1 12 22 25 37
2 3 21 35 36
3 12 24 25 33
15 16 19 21 26
5 16 20 27 29
9 15 18 27 28
2 12 13 33 37
6 9 18 31 33
3 18 21 26 29
10 15 20 23 29
4 17 18 27 31
7 16 22 23 29
3 16 21 27 30
6 15 21 22 33
12 13 17 26 29
A database with tables organized by sum supplies the results and I use modulo to filter the odd and even results. Ex. num1 + num2 + ... = 0, EVEN | num1 + num2 ... = 1 ODD
My problem is that - now I want a filter that displays only numbers that contain - [0,1,2,3 ] in the result.
Choosing from the above sample list of numbers that equal the sum of 37 - this occurs in these cases:
7 10 19 29 32
10 15 20 23 29
3 16 21 27 30
10 15 20 23 29
Each string of 5 space delimited numbers contain a number from this list [0,1,2,3].
(7 is not in the list but the overall string contains all of the requisite numbers from the list of 0,1,2,3 so the string result still qualifies.)
Ex.
10 - contains a 0 and a 1
19 contains a 1
29 contains a 2
32 contains a I and a 2
etc...
So stating the problem:
Capture the query results in a structure that will allow me to check each string of numbers
for a minimum set of characters in the string and return those delimited string results that contain a number from my list in their make-up [0,1,2,3]
Not sure what is the best way to go to do this?