Ok, so I'm a bit tiffed at Yahoo right now (I'll put my rant at the end) but I can't not be excited about this. Luke S. emailed me a week or two and suggested I take a look at this blog entry: Tutorial: Scraping and tuning a web site into a widget with YQL. This blog entry discusses how you can use YQL (Yahoo Query Language) as a way to arbitrarily parse random HTML pages. He discusses how he uses XPATH against the HTML to parse up the data and reuse it. I thought this was cool - but then I began looking at the YQL docs more closely and found that this was just the tip of the iceberg.
Yahoo's built what amounts to a SQL interface to both the web and it's own data services. So along with doing XPATH parses on HTML pages, you can also search against flickr:
SELECT * FROM flickr.photos.search WHERE text="cat"
And their search engine as well:
select title,abstract,url from search.web(0,10) where query='coldfusion'
This is - to be sure - some slick stuff! Yahoo always does a real good job with their API, so I encourage you to check out their docs for more detailed information. They support JSON, XML, and JSONP for responses. At lunch I looked into how difficult it would be to parse their results into native ColdFusion data. I came up with the YQL custom tag. Here are a few examples:
<cf_yql name="results">
select * from html(0,10) where url='http://www.dcs.gla.ac.uk/~joy/fun/jokes/TV.html' and xpath='//ul/li/p'
</cf_yql>
<cfdump var="#results#">
<cf_yql name="results2">
SELECT * FROM flickr.photos.search WHERE text='coldfusion'
</cf_yql>
<cfdump var="#results2#">
<cf_yql name="results3">
select title,abstract,url from search.web(0,10) where query='coldfusion'
</cf_yql>
<cfdump var="#results3#">
The first query simply mimics the results of the blog entry. The second two perform searches. All three though return query objects just like the built in cfquery tag. Here is a quick screen shot from the first query:

The actual custom tag isn't too complex. Remember ColdFusion makes it simple to get content between tags:
<cfif thisTag.executionMode is "end">
<!--- get the yql --->
<cfset yql = thisTag.generatedContent>
<!--- remove it --->
<cfset thisTag.generatedContent = "">
<cfset yql = trim(urlEncodedFormat(yql))>
This grabs the content, trims it, and urlEncodedFormats the string so I can pass it in a url.
<cfhttp url="http://query.yahooapis.com/v1/public/yql?q=#yql#&format=json" result="result" getasbinary="no">
For some reason, the fileContent result wasn't plain text so I had to run toString() on it:
<!--- convert bin data to string --->
<cfset data = result.fileContent.toString()>
<!--- convert json to CF --->
<cfset data = deserializeJSON(data)>
After that it was just a matter of parsing the results into a query. You can download the attached code if you want to see all the details. This hasn't been heavily tested, but it's certainly fun. I'll look into adding it to CFYahoo Package if folks actually think this is useful.
p.s. Rant time. Some time ago I worked with the Yahoo Developer Network to get ColdFusion included in their Search API kit and to write up a ColdFusion Developer Center. Unfortunately, Yahoo has decided to drop the ColdFusion Developer Center. This is what I got on Twitter:
ydn @cfjedimaster we didn't get a renewed interest in #coldfusion from our community and we've been trying to streamline our site/offerings
I protested this (well, on Twitter) but have yet to get a response. I also posted my query to their forums. I'd really appreciate it folks ping them - either via Twitter or their forums, and let them know that you want to see ColdFusion represented. Crap - the content was written. They didn't have to pay anything for it (and I'd gladly keep it updated) so this should be a no brainer for them. Shoot - they have a Silverlight developer center. You can't tell me there are more SL devs then ColdFusion.
Archived Comments
Very neat stuff. I'll look into this with more detail soon.
Silverlight doesn't surprise me now especially with their Microsoft relationship.
That's slick. I had no idea that YQL offered that type of functionality. I'll have to take a closer look into that.
Nice find, Ray!
The problem is, no if they haven't been getting the traffic, then I doubt no matter how much protesting you do it's not going to matter.
Ray, I tried this on Railo - kept getting a 400 error as a response from Yahoo. Blair McKenzie on the Railo list suggested I change htmlEditFormat() to urlencodedformat(). It then started working.
I know! I've been looking all over for the CF section over at YDN. Bastards! It was probably one of the terms of the deal with Bing. [sigh]
It looks like Pipes may also go away too, they're sure sounding like they're looking for stuff to get rid of that would give MS competition.
@Todd: Wow, the htmlEditFormat was a total brain fart. It should definitely be urlEncodedFormat. Will fix in 2 minutes and replace the zip.
@Thinman: Worse comes to worse, I'll post my copy of the files.
Zip fixed. Thanks again Todd.
The YQL and Pipes products are developed by the same team and are not related to the search teams/products. We're very close to a big update to Pipes which builds upon the new YQL technology - Im glad you're enjoying both!
I really can't believe this YQL hasn't taken the world by storm :)
It's really awesome!
I played around w/ it few months ago:
http://tech.tjandrawibawa.o...
hi Ray, fantastic - thanks very much for taking the time to do a YQL post :-) it's definitely the kinda thing that will come in useful for folks at somepoint.
as for YDN - *grrrrr*, shall ping.
Ray,
Good Stuff. I've been using the pipes feed from Yahoo for a while and have only scratched the surface with what is possible. I would very interested in the cf_yahoo package.
-Rudi
Per Ray's request to drop a link here. I followed up Ray's post with how to create a Custom Tag CFC (only available in Railo) and how you can actually make this a built-in-tag in Railo. By built in, I mean: <cfyql>
http://www.railo.ch/blog/in...
@Rudi: Not sure what you mean by your last statement. There _is_ a CFYAhoo package at RIAForge. It covers many of Yahoo's APIs. Are you saying it makes sense to put this _in_ there?
YQL looks very interesting . . .
Shame they removed the CF Developer content. I've left a comment on their forum.
@ray
Yes, that's what I meant. That's what I get for posting before the caffeine kicks in.
Ray, really awesome post. That's way cool that you can use YQL to parse web pages. Awesomeness.
Just an FYI. Spoke to YDN, and the ColdFusion Dev Center WILL be returning!!
@Ray,
Are you so badass that you actually got them to change their mind :D
Heh, no, I think it was all the people who posted/complained on the forums.
The ColdFusion Army is strong!
Sure sure... you can deflect the greatness all you want, but I think we know what's going on here :)
How did Ray get Yahoo to continue the CF section? ColdFusionJedi mind trick!! (someone had to say it!)
Awesome post MrCamden!
I whined. ;) Nah, not really. I hooked up with the person in charge, and with Adam Lehman, and we discussed the center and what we could do to ensure it stays up to date. (That was Yahoo's primary concern.) Now that it is back, Adam and I need to come up with a plan to help update it.
Thanks Ray, this is just excellent (both YQL and cf_yql), just what I needed for a project I'm playing with. I do notice it seems to have some problems with a little more complex data, like the result of select * from flickr.photos.exif where photo_id='3889005905' which returns:
Element 1 is undefined in a CFML structure referenced as part of an expression.
47 : <cfif isSimpleValue(data.query.results[key][1])>
Trying to figure it out, but I'll probably never get there, I'm a sysadmin and shouldn't be doing this in the first place ;)
I'm very unsure about this fix - it works, but will probably need to be redone.
Grab the bits here: http://www.coldfusionjedi.c...
Scratch that. It breaks other Flickr searches.
This one should be better:
http://www.coldfusionjedi.c...
Hi ray! adding it to yahoo Package will be nice to play around this tag properly. Have you looked at their Yahoo BOSS! it is quite cool too.
I take it this is dependent on anticipating well formed xml at the end of the url?
Another alternative is a to provide an xQuery parser e.g
<cf_XQuery name="results">
for $i in document("http://www.dcs.gla.ac.uk/~j...")
return $i//ul/li/p
</CF_XQuery>
This provides more expressiveness than SQL and xpath combined.
I have developed a custom tag to provide this functionality at:
http://www.cfxquery.co.uk/C...
Many Thanks,
Will
No - the YQL isn't XML at all. Where did you get that?
Found a bug when trying to run a query that grabs one record and returns the results as a struct instead of an array.
query : SELECT * FROM upcoming.events WHERE event_id="8452132"
fix(added this to the cfif block where the output query is created) :
<code>
<cfelseif totalResults eq 1 and isStruct(data.query.results[key])>
<cfset query = queryNew(structKeyList(data.query.results[key]))>
<cfset queryAddRow(query)>
<cfloop index="col" list="#query.columnList#">
<cfset querySetCell(query, col, data.query.results[key][col])>
</cfloop>
</code>
Thanks Joey.
I realize this thread is 3 years old, but is there still a place I can get the version 3 yql3.zip?
I think I posted it up on RIAForge.