Yet one more YQL demo - Term extraction from CFBloggers

I continue to be amazed at just how kick ass YQL (Yahoo Query Language) is for building mashups. I think it is one of the most innovative things I’ve seen on the Net in years. Earlier this week I was directed to the Rock Your Data YQL demos. Each shows a pretty darn cool example of what can be done with the technology.

In my mind, the best example was the term extraction example. At some point YQL added support for retrieving important terms from filtered data. Consider the author's example:

select * from search.termextract where context in (select content from html where url="" and xpath="//a") | unique(field="Result")

That's about as simple as you can make it. I decided to quickly test this against CFBloggers. I modified the URL to point directly to the content and to pass along a request setting to get 100 items:

select * from search.termextract where context in (select content from html where url="" and xpath="//a[starts-with(@href,'click.cfm')]") | unique(field="Result")

Once I had that, I then simply wrote a quick parser. I decided to go full bore and turn the results into a tag cloud. (This tweet was my inspiration.) Pete Freitag has a good blog post on this: How to make a tag cloud. The final result can be found below. Before dumping all the code, here is the result:

Quick Note - I added some spaces to the URL in the first line so that it would wrap better on my blog. Remove before using.

<cfset yql = "*%20from%20search.termextract%20 where%20context%20in%20(select%20content%20from%20html%20 (%40href%2C'click.cfm')%5D%22)%20%20%7C%20unique(field%3D%22Result%22)& diagnostics=true&"> <cfhttp url="#yql#" result="yqlhttp"> <cfset data = xmlParse(yqlhttp.filecontent)> <cfset termData = queryNew("word,count")> <cfloop index="result" array="#data.query.results.result#"> <cfset word = result.xmltext> <cfset count = result['yahoo:repeatcount'].xmlText> <cfset queryAddRow(termData)> <cfset querySetCell(termData, "word", word)> <cfset querySetCell(termData, "count", count)> </cfloop> <cfset tagValueArray = ListToArray(ValueList(termData.count))> <cfset max = ArrayMax(tagValueArray)> <cfset min = ArrayMin(tagValueArray)> <cfset diff = max - min> <cfset distribution = diff / 3> <style> .smallestTag { font-size: xx-small; } .smallTag { font-size: small; } .mediumTag { font-size: medium; } .largeTag { font-size: large; } .largestTag { font-size: xx-large; } </style> <cfoutput query="termData"> <cfif count EQ min> <cfset class="smallestTag"> <cfelseif count EQ max> <cfset class="largestTag"> <cfelseif count GT (min + (distribution*2))> <cfset class="largeTag"> <cfelseif count GT (min + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <b class="#class#">#word#</b> </cfoutput>

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support.

Lafayette, LA