Counting Word Instances in a String

Yesterday in the IRC channel someone asked if there was a way to count the number of times each unique word appears in a string. While it was obvious that this could be done manually (see below), no one knew of a more elegant solution. Can anyone think of one? Here is the solution I used and it definitely falls into the “manual” (and probably slow) category.

First I made my string:

<cfsavecontent variable="string"> This is a paragraph with some text in it. Certain words will be repeated, and other words will not be repeated. The question is though, how much can I write before I begin to sound like a complete and utter idiot. Let's call that the "Paris Point". At the Paris Point, any further words sound like gibberish and are completely worthless. </cfsavecontent>

I then used some regex to get an array of words:

<cfset words = reMatch("[[:word:]]+", string)>

Next I created a structure:

<cfset wordCount = structNew()>

And then looped over the array and inserted the words into the structure:

<cfloop index="word" array="#words#"> <cfif structKeyExists(wordCount, word)> <cfset wordCount[word]++> <cfelse> <cfset wordCount[word] = 1> </cfif> </cfloop>

Note that this will be inherently case-insenstive, which I think is a good thing. At this point we are done, but I added some display code as well:

<cfset sorted = structSort(wordCount, "numeric", "desc")>

<table border=”1” width=”400”> <tr> <th width=”50%”>Word</th> <th>Count</th> </tr>

<cfloop index=”word” array=”#sorted#”> <cfoutput> <tr> <td>#word#</td> <td>#wordCount[word]#</td> </tr> </cfoutput> </cfloop> </code>

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support.

Lafayette, LA