Ok, so I’m a bit tiffed at Yahoo right now (I’ll put my rant at the end) but I can’t not be excited about this. Luke S. emailed me a week or two and suggested I take a look at this blog entry: Tutorial: Scraping and tuning a web site into a widget with YQL. This blog entry discusses how you can use YQL (Yahoo Query Language) as a way to arbitrarily parse random HTML pages. He discusses how he uses XPATH against the HTML to parse up the data and reuse it. I thought this was cool - but then I began looking at the YQL docs more closely and found that this was just the tip of the iceberg. Yahoo’s built what amounts to a SQL interface to both the web and it’s own data services. So along with doing XPATH parses on HTML pages, you can also search against flickr:
SELECT * FROM flickr.photos.search WHERE text="cat"
And their search engine as well:
select title,abstract,url from search.web(0,10) where query='coldfusion'
This is - to be sure - some slick stuff! Yahoo always does a real good job with their API, so I encourage you to check out their docs for more detailed information. They support JSON, XML, and JSONP for responses. At lunch I looked into how difficult it would be to parse their results into native ColdFusion data. I came up with the YQL custom tag. Here are a few examples:
select * from html(0,10) where url='http://www.dcs.gla.ac.uk/~joy/fun/jokes/TV.html' and xpath='//ul/li/p'
<cf_yql name=”results2”> SELECT * FROM flickr.photos.search WHERE text=’coldfusion’ </cf_yql> <cfdump var=”#results2#”>
<cf_yql name=”results3”> select title,abstract,url from search.web(0,10) where query=’coldfusion’ </cf_yql> <cfdump var=”#results3#”> </code>
The first query simply mimics the results of the blog entry. The second two perform searches. All three though return query objects just like the built in cfquery tag. Here is a quick screen shot from the first query:
The actual custom tag isn’t too complex. Remember ColdFusion makes it simple to get content between tags:
<cfif thisTag.executionMode is "end">
<!--- get the yql ---> <cfset yql = thisTag.generatedContent> <!--- remove it ---> <cfset thisTag.generatedContent = ""> <cfset yql = trim(urlEncodedFormat(yql))> </code>
This grabs the content, trims it, and urlEncodedFormats the string so I can pass it in a url.
<cfhttp url="http://query.yahooapis.com/v1/public/yql?q=#yql#&format=json" result="result" getasbinary="no">
For some reason, the fileContent result wasn’t plain text so I had to run toString() on it:
<!--- convert bin data to string --->
<cfset data = result.fileContent.toString()>
<!--- convert json to CF --->
<cfset data = deserializeJSON(data)>
After that it was just a matter of parsing the results into a query. You can download the attached code if you want to see all the details. This hasn’t been heavily tested, but it’s certainly fun. I’ll look into adding it to CFYahoo Package if folks actually think this is useful.
p.s. Rant time. Some time ago I worked with the Yahoo Developer Network to get ColdFusion included in their Search API kit and to write up a ColdFusion Developer Center. Unfortunately, Yahoo has decided to drop the ColdFusion Developer Center. This is what I got on Twitter:
ydn @cfjedimaster we didn't get a renewed interest in #coldfusion from our community and we've been trying to streamline our site/offerings
I protested this (well, on Twitter) but have yet to get a response. I also posted my query to their forums. I’d really appreciate it folks ping them - either via Twitter or their forums, and let them know that you want to see ColdFusion represented. Crap - the content was written. They didn’t have to pay anything for it (and I’d gladly keep it updated) so this should be a no brainer for them. Shoot - they have a Silverlight developer center. You can’t tell me there are more SL devs then ColdFusion.<p>Download attached file.</p>