Generate a tag cloud from an RSS feed with ColdFusion

Earlier today Mike Henke asked if there was a way to generate a tag cloud from an RSS feed. While he was able to find a solution quick enough (Wordle), I thought it would be kind of fun to try this myself. I knew that Pete Freitag had already blogged on tag clouds and ColdFusion, so all I had to do was generate my word data and pass it to his code. Here’s what I came up with.

I began with a simple call to my RSS URL to generate a query of data. For my testing, this was the only thing I cached. Obviously all of my "crunching" could have been cached.

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://feedproxy.google.com/RaymondCamdensColdfusionBlog"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif>

Now for the fun part. In order to use Pete's code, I need to know each word and the number of times it appears. I began with an empty struct:

<cfset allwords = {}>

Next, I created a list of "stop" words, words I'd always ignore. (Note, this list was kind of arbitrary. Also note I added some spaces in the blog entry just so it would wrap better.)

<cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you, you're,you've,with,why,which,when,were,we've,we're, then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done, off,their,isn't,yes,what's,them,they,'',be,being,all, only,does,here,an,by,would,like,at,do,want,or,could, out,our,while,what,had,each,into,where,That's,will,else, let's,about,got,using,before,over,actually,going,some,well">

I then split by word boundary and added them to the struct. Note that this word boundary also includes ' so I can match "don't". This is not perfect, but good enough.

<cfloop query="rss"> <cfset words = reMatch("[\w']+",bigstring)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop>

I had quite a few words, so I decided to remove all words with less than 5 instances.

<cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 5> <cfset structDelete(allwords,k)> </cfif> </cfloop>

Now comes Pete's code to generate high/low values.

<cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop> <cfset diff = maxval - minval> <cfset distribution = diff / 3>

And finally, the output:

<h2>Word Cloud</h2> <cfloop item="w" collection="#allWords#"> <cfif allWords[w] EQ minval> <cfset class="smallestTag"> <cfelseif allWords[w] EQ maxval> <cfset class="largestTag"> <cfelseif allWords[w] GT (minval + (distribution*2))> <cfset class="largeTag"> <cfelseif allWords[w] GT (minval + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <cfoutput><span class="#class#">#w#</a></cfoutput> </cfloop> </p>

Sexy, eh? Here is the output from my blog:

I then pointed it at the RSS feed from ColdFusionBloggers:

I probably could have shortened that a lot more with my minimum filter. Anyway, I then did one more tweak. Instead of counting words, I simply took the category list:

<cfloop query="rss"> <cfset words = listToArray(categorylabel)>

This tag cloud then represents categories from the RSS feed:

And that's it. Totally and completely stupid, but fun. Here's the current script, although it's a bit messy. As I said, normally you would want to cache all of the crunching.

p.s. Words a bit hard to read in the pictures? Right click and open in new tab. Sorry about that!

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://www.coldfusionbloggers.org/rss.cfm"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif> <!--- create a count of words ---> <cfset allwords = {}> <cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you,you're,you've,with,why,which,when,were,we've,we're,then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done,off,their,isn't,yes,what's,them,they,'',be,being,all,only,does,here,an,by,would,like,at,do,want,or,could,out,our,while,what,had,each,into,where,That's,will,else,let's,about,got,using,before,over,actually,going,some,well"> <cfloop query="rss"> <!--- <cfset words = reMatch("[\w']+",bigstring)> ---> <cfset words = listToArray(categorylabel)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop> <!--- remove where val < 5, 5 being a bit arbitrary ---> <!--- <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 0> <cfset structDelete(allwords,k)> </cfif> </cfloop> ---> <!--- get min, max ---> <cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop> <cfset diff = maxval - minval> <cfset distribution = diff / 3> <!DOCTYPE html> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" /> <meta name="description" content="" /> <meta name="keywords" content="" /> <link rel="stylesheet" href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.min.css"> <!--[if lt IE 9]><script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script><![endif]--> <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script> <script type="text/javascript"> $(function() { }); </script> <style> .smallestTag { font-size: xx-small; } .smallTag { font-size: small; } .mediumTag { font-size: medium; } .largeTag { font-size: large; } .largestTag { font-size: xx-large; } </style> </head> <body> <div class="container"> <h2>Word Cloud</h2> <cfloop item="w" collection="#allWords#"> <cfif allWords[w] EQ minval> <cfset class="smallestTag"> <cfelseif allWords[w] EQ maxval> <cfset class="largestTag"> <cfelseif allWords[w] GT (minval + (distribution*2))> <cfset class="largeTag"> <cfelseif allWords[w] GT (minval + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <cfoutput><span class="#class#">#w#</a></cfoutput> </cfloop> </p> </div> </body> </html>

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Comments