Earlier today Mike Henke asked if there was a way to generate a tag cloud from an RSS feed. While he was able to find a solution quick enough (Wordle), I thought it would be kind of fun to try this myself. I knew that Pete Freitag had already blogged on tag clouds and ColdFusion, so all I had to do was generate my word data and pass it to his code. Here's what I came up with.

I began with a simple call to my RSS URL to generate a query of data. For my testing, this was the only thing I cached. Obviously all of my "crunching" could have been cached.

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://feedproxy.google.com/RaymondCamdensColdfusionBlog"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif>

Now for the fun part. In order to use Pete's code, I need to know each word and the number of times it appears. I began with an empty struct:

<cfset allwords = {}>

Next, I created a list of "stop" words, words I'd always ignore. (Note, this list was kind of arbitrary. Also note I added some spaces in the blog entry just so it would wrap better.)

<cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you, you're,you've,with,why,which,when,were,we've,we're, then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done, off,their,isn't,yes,what's,them,they,'',be,being,all, only,does,here,an,by,would,like,at,do,want,or,could, out,our,while,what,had,each,into,where,That's,will,else, let's,about,got,using,before,over,actually,going,some,well">

I then split by word boundary and added them to the struct. Note that this word boundary also includes ' so I can match "don't". This is not perfect, but good enough.

<cfloop query="rss"> <cfset words = reMatch("[\w']+",bigstring)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop>

I had quite a few words, so I decided to remove all words with less than 5 instances.

<cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 5> <cfset structDelete(allwords,k)> </cfif> </cfloop>

Now comes Pete's code to generate high/low values.

<cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop>

<cfset diff = maxval - minval> <cfset distribution = diff / 3>

And finally, the output:

<h2>Word Cloud</h2> <cfloop item="w" collection="#allWords#"> <cfif allWords[w] EQ minval> <cfset class="smallestTag"> <cfelseif allWords[w] EQ maxval> <cfset class="largestTag"> <cfelseif allWords[w] GT (minval + (distribution*2))> <cfset class="largeTag"> <cfelseif allWords[w] GT (minval + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <cfoutput><span class="#class#">#w#</a></cfoutput> </cfloop> </p>

Sexy, eh? Here is the output from my blog:

I then pointed it at the RSS feed from ColdFusionBloggers:

I probably could have shortened that a lot more with my minimum filter. Anyway, I then did one more tweak. Instead of counting words, I simply took the category list:

<cfloop query="rss"> <cfset words = listToArray(categorylabel)>

This tag cloud then represents categories from the RSS feed:

And that's it. Totally and completely stupid, but fun. Here's the current script, although it's a bit messy. As I said, normally you would want to cache all of the crunching.

p.s. Words a bit hard to read in the pictures? Right click and open in new tab. Sorry about that!

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://www.coldfusionbloggers.org/rss.cfm"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif>

<!--- create a count of words ---> <cfset allwords = {}> <cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you,you're,you've,with,why,which,when,were,we've,we're,then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done,off,their,isn't,yes,what's,them,they,'',be,being,all,only,does,here,an,by,would,like,at,do,want,or,could,out,our,while,what,had,each,into,where,That's,will,else,let's,about,got,using,before,over,actually,going,some,well">

<cfloop query="rss"> <!--- <cfset words = reMatch("[\w']+",bigstring)> ---> <cfset words = listToArray(categorylabel)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop>

<!--- remove where val < 5, 5 being a bit arbitrary ---> <!--- <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 0> <cfset structDelete(allwords,k)> </cfif> </cfloop> --->

<!--- get min, max ---> <cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop>

<cfset diff = maxval - minval> <cfset distribution = diff / 3>

<!DOCTYPE html> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" /> <meta name="description" content="" /> <meta name="keywords" content="" />

<link rel="stylesheet" href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.min.css"> <!--[if lt IE 9]><script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script><![endif]--> <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script> <script type="text/javascript"> $(function() {

}); </script> <style> .smallestTag { font-size: xx-small; } .smallTag { font-size: small; } .mediumTag { font-size: medium; } .largeTag { font-size: large; } .largestTag { font-size: xx-large; } </style> </head> <body>

<div class="container"> <h2>Word Cloud</h2> <cfloop item="w" collection="#allWords#"> <cfif allWords[w] EQ minval> <cfset class="smallestTag"> <cfelseif allWords[w] EQ maxval> <cfset class="largestTag"> <cfelseif allWords[w] GT (minval + (distribution*2))> <cfset class="largeTag"> <cfelseif allWords[w] GT (minval + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <cfoutput><span class="#class#">#w#</a></cfoutput> </cfloop> </p>

</div>

</body> </html>