Earlier today Mike Henke asked if there was a way to generate a tag cloud from an RSS feed. While he was able to find a solution quick enough (Wordle), I thought it would be kind of fun to try this myself. I knew that Pete Freitag had already blogged on tag clouds and ColdFusion, so all I had to do was generate my word data and pass it to his code. Here's what I came up with.

I began with a simple call to my RSS URL to generate a query of data. For my testing, this was the only thing I cached. Obviously all of my "crunching" could have been cached.

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://feedproxy.google.com/RaymondCamdensColdfusionBlog"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif>

Now for the fun part. In order to use Pete's code, I need to know each word and the number of times it appears. I began with an empty struct:

<cfset allwords = {}>

Next, I created a list of "stop" words, words I'd always ignore. (Note, this list was kind of arbitrary. Also note I added some spaces in the blog entry just so it would wrap better.)

<cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you, you're,you've,with,why,which,when,were,we've,we're, then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done, off,their,isn't,yes,what's,them,they,'',be,being,all, only,does,here,an,by,would,like,at,do,want,or,could, out,our,while,what,had,each,into,where,That's,will,else, let's,about,got,using,before,over,actually,going,some,well">

I then split by word boundary and added them to the struct. Note that this word boundary also includes ' so I can match "don't". This is not perfect, but good enough.

<cfloop query="rss"> <cfset words = reMatch("[\w']+",bigstring)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop>

I had quite a few words, so I decided to remove all words with less than 5 instances.

<cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 5> <cfset structDelete(allwords,k)> </cfif> </cfloop>

Now comes Pete's code to generate high/low values.

<cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop>

<cfset diff = maxval - minval> <cfset distribution = diff / 3>

And finally, the output:

<h2>Word Cloud</h2> <cfloop item="w" collection="#allWords#"> <cfif allWords[w] EQ minval> <cfset class="smallestTag"> <cfelseif allWords[w] EQ maxval> <cfset class="largestTag"> <cfelseif allWords[w] GT (minval + (distribution*2))> <cfset class="largeTag"> <cfelseif allWords[w] GT (minval + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <cfoutput><span class="#class#">#w#</a></cfoutput> </cfloop> </p>

Sexy, eh? Here is the output from my blog:

I then pointed it at the RSS feed from ColdFusionBloggers:

I probably could have shortened that a lot more with my minimum filter. Anyway, I then did one more tweak. Instead of counting words, I simply took the category list:

<cfloop query="rss"> <cfset words = listToArray(categorylabel)>

This tag cloud then represents categories from the RSS feed:

And that's it. Totally and completely stupid, but fun. Here's the current script, although it's a bit messy. As I said, normally you would want to cache all of the crunching.

p.s. Words a bit hard to read in the pictures? Right click and open in new tab. Sorry about that!

<cfset rss = cacheGet("rss")> <cfif isNull(rss)> <cfset feedUrl = "http://www.coldfusionbloggers.org/rss.cfm"> <cffeed source="#feedUrl#" query="rss"> <cfset cacheput("rss", rss,createTimespan(0,1,0,0))> </cfif>

<!--- create a count of words ---> <cfset allwords = {}> <cfset stopwords = "and,this,the,a,it,as,was,to,don't,has,you,you're,you've,with,why,which,when,were,we've,we're,then,than,i,i'll,i'm,i've,i'd,it's,for,of,is,if,in,that,but,my,not,can,are,',done,off,their,isn't,yes,what's,them,they,'',be,being,all,only,does,here,an,by,would,like,at,do,want,or,could,out,our,while,what,had,each,into,where,That's,will,else,let's,about,got,using,before,over,actually,going,some,well">

<cfloop query="rss"> <!--- <cfset words = reMatch("[\w']+",bigstring)> ---> <cfset words = listToArray(categorylabel)> <cfloop index="w" array="#words#"> <cfif len(w) gt 1 and not listFindNoCase(stopwords, w)> <cfif not structKeyExists(allwords, w)> <cfset allwords[w] = 0> </cfif> <cfset allwords[w]++> </cfif> </cfloop> </cfloop>

<!--- remove where val < 5, 5 being a bit arbitrary ---> <!--- <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lte 0> <cfset structDelete(allwords,k)> </cfif> </cfloop> --->

<!--- get min, max ---> <cfset minval = 999999> <cfset maxval = 0> <cfloop item="k" collection="#allwords#"> <cfif allwords[k] lt minval> <cfset minval = allwords[k]> <cfelseif allwords[k] gt maxval> <cfset maxval = allwords[k]> </cfif> </cfloop>

<cfset diff = maxval - minval> <cfset distribution = diff / 3>

<!DOCTYPE html> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" /> <meta name="description" content="" /> <meta name="keywords" content="" />

&lt;link rel="stylesheet" href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.min.css"&gt;
&lt;!--[if lt IE 9]&gt;&lt;script src="http://html5shim.googlecode.com/svn/trunk/html5.js"&gt;&lt;/script&gt;&lt;![endif]--&gt;
&lt;script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"&gt;&lt;/script&gt;
&lt;script type="text/javascript"&gt;
	$(function() {
		
	});	
&lt;/script&gt;
&lt;style&gt;
.smallestTag { font-size: xx-small; }
.smallTag { font-size: small; }
.mediumTag { font-size: medium; }
.largeTag { font-size: large; }
.largestTag { font-size: xx-large; } 
&lt;/style&gt;

</head> <body>

&lt;div class="container"&gt;
	&lt;h2&gt;Word Cloud&lt;/h2&gt;
	&lt;cfloop item="w" collection="#allWords#"&gt;
		&lt;cfif allWords[w] EQ minval&gt;
			&lt;cfset class="smallestTag"&gt;
		&lt;cfelseif allWords[w] EQ maxval&gt;
			&lt;cfset class="largestTag"&gt;
		&lt;cfelseif allWords[w] GT (minval + (distribution*2))&gt;
			&lt;cfset class="largeTag"&gt;
		&lt;cfelseif allWords[w] GT (minval + distribution)&gt;
			&lt;cfset class="mediumTag"&gt;
		&lt;cfelse&gt;
			&lt;cfset class="smallTag"&gt;
		&lt;/cfif&gt;
		&lt;cfoutput&gt;&lt;span class="#class#"&gt;#w#&lt;/a&gt;&lt;/cfoutput&gt;
	&lt;/cfloop&gt;
	&lt;/p&gt;

&lt;/div&gt;

</body> </html>