Yet one more YQL demo - Term extraction from CFBloggers

This post is more than 2 years old.

I continue to be amazed at just how kick ass YQL (Yahoo Query Language) is for building mashups. I think it is one of the most innovative things I've seen on the Net in years. Earlier this week I was directed to the Rock Your Data YQL demos. Each shows a pretty darn cool example of what can be done with the technology.

In my mind, the best example was the term extraction example. At some point YQL added support for retrieving important terms from filtered data. Consider the author's example:

select * from search.termextract where context in (select content from html where url="" and xpath="//a") | unique(field="Result")

That's about as simple as you can make it. I decided to quickly test this against CFBloggers. I modified the URL to point directly to the content and to pass along a request setting to get 100 items:

select * from search.termextract where context in (select content from html where url="" and xpath="//a[starts-with(@href,'click.cfm')]") | unique(field="Result")

Once I had that, I then simply wrote a quick parser. I decided to go full bore and turn the results into a tag cloud. (This tweet was my inspiration.) Pete Freitag has a good blog post on this: How to make a tag cloud. The final result can be found below. Before dumping all the code, here is the result:

Quick Note - I added some spaces to the URL in the first line so that it would wrap better on my blog. Remove before using.

<cfset yql = "*%20from%20search.termextract%20 where%20context%20in%20(select%20content%20from%20html%20 (%40href%2C'click.cfm')%5D%22)%20%20%7C%20unique(field%3D%22Result%22)& diagnostics=true&"> <cfhttp url="#yql#" result="yqlhttp"> <cfset data = xmlParse(yqlhttp.filecontent)>

<cfset termData = queryNew("word,count")> <cfloop index="result" array="#data.query.results.result#"> <cfset word = result.xmltext> <cfset count = result['yahoo:repeatcount'].xmlText> <cfset queryAddRow(termData)> <cfset querySetCell(termData, "word", word)> <cfset querySetCell(termData, "count", count)> </cfloop>

<cfset tagValueArray = ListToArray(ValueList(termData.count))> <cfset max = ArrayMax(tagValueArray)> <cfset min = ArrayMin(tagValueArray)> <cfset diff = max - min> <cfset distribution = diff / 3>

<style> .smallestTag { font-size: xx-small; } .smallTag { font-size: small; } .mediumTag { font-size: medium; } .largeTag { font-size: large; } .largestTag { font-size: xx-large; } </style> <cfoutput query="termData"> <cfif count EQ min> <cfset class="smallestTag"> <cfelseif count EQ max> <cfset class="largestTag"> <cfelseif count GT (min + (distribution*2))> <cfset class="largeTag"> <cfelseif count GT (min + distribution)> <cfset class="mediumTag"> <cfelse> <cfset class="smallTag"> </cfif> <b class="#class#">#word#</b> </cfoutput>

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA

Archived Comments

Comment 1 by Gadi posted on 8/12/2010 at 11:19 PM

I have found YQL to be fantastic. However, I am trying to run some queries that require the user's authentication. I am using oauth, and am having trouble with the 3-legged auth. Do you know of any resources that may help with this?


Comment 2 by Raymond Camden posted on 8/13/2010 at 1:35 AM

Sorry, I've done _zip_ with oauth.

Comment 3 by markandey singh posted on 8/21/2010 at 3:46 PM

There also simple way for YQL scraping, see Chrome Plugin

Comment 4 by Ralph Everest posted on 11/11/2010 at 9:33 PM

Ray, do you know if querying YQL with CF has changed? I ask because your example no longer seems to work.

Comment 5 by Raymond Camden posted on 11/12/2010 at 1:15 AM

I just pasted th YQL URL into my browser and it worked. So how is it failing for you? Try adding a few CFDUMPs to debug.

Comment 6 by Ralph Everest posted on 11/16/2010 at 11:39 PM

Apparently, there are "regional" issues.

Comment 7 by Raymond Camden posted on 11/16/2010 at 11:48 PM

Ugh - well thanks for the info.

Comment 8 by Gary H-S posted on 12/29/2010 at 8:06 PM

Good stuff, Ray - thank you thank you.