Yahoo Query Language

This post is more than 2 years old.

Ok, so I'm a bit tiffed at Yahoo right now (I'll put my rant at the end) but I can't not be excited about this. Luke S. emailed me a week or two and suggested I take a look at this blog entry: Tutorial: Scraping and tuning a web site into a widget with YQL. This blog entry discusses how you can use YQL (Yahoo Query Language) as a way to arbitrarily parse random HTML pages. He discusses how he uses XPATH against the HTML to parse up the data and reuse it. I thought this was cool - but then I began looking at the YQL docs more closely and found that this was just the tip of the iceberg.

Yahoo's built what amounts to a SQL interface to both the web and it's own data services. So along with doing XPATH parses on HTML pages, you can also search against flickr:

SELECT * FROM flickr.photos.search WHERE text="cat"

And their search engine as well:

select title,abstract,url from search.web(0,10) where query='coldfusion'

This is - to be sure - some slick stuff! Yahoo always does a real good job with their API, so I encourage you to check out their docs for more detailed information. They support JSON, XML, and JSONP for responses. At lunch I looked into how difficult it would be to parse their results into native ColdFusion data. I came up with the YQL custom tag. Here are a few examples:

<cf_yql name="results"> select * from html(0,10) where url='http://www.dcs.gla.ac.uk/~joy/fun/jokes/TV.html' and xpath='//ul/li/p' </cf_yql>

<cfdump var="#results#">

<cf_yql name="results2"> SELECT * FROM flickr.photos.search WHERE text='coldfusion' </cf_yql> <cfdump var="#results2#">

<cf_yql name="results3"> select title,abstract,url from search.web(0,10) where query='coldfusion' </cf_yql> <cfdump var="#results3#">

The first query simply mimics the results of the blog entry. The second two perform searches. All three though return query objects just like the built in cfquery tag. Here is a quick screen shot from the first query:

The actual custom tag isn't too complex. Remember ColdFusion makes it simple to get content between tags:

<cfif thisTag.executionMode is "end">
&lt;!--- get the yql ---&gt;
&lt;cfset yql = thisTag.generatedContent&gt;
&lt;!--- remove it ---&gt;
&lt;cfset thisTag.generatedContent = ""&gt;

&lt;cfset yql = trim(urlEncodedFormat(yql))&gt;

This grabs the content, trims it, and urlEncodedFormats the string so I can pass it in a url.

<cfhttp url="http://query.yahooapis.com/v1/public/yql?q=#yql#&format=json" result="result" getasbinary="no">

For some reason, the fileContent result wasn't plain text so I had to run toString() on it:

<!--- convert bin data to string ---> <cfset data = result.fileContent.toString()> <!--- convert json to CF ---> <cfset data = deserializeJSON(data)>

After that it was just a matter of parsing the results into a query. You can download the attached code if you want to see all the details. This hasn't been heavily tested, but it's certainly fun. I'll look into adding it to CFYahoo Package if folks actually think this is useful.

p.s. Rant time. Some time ago I worked with the Yahoo Developer Network to get ColdFusion included in their Search API kit and to write up a ColdFusion Developer Center. Unfortunately, Yahoo has decided to drop the ColdFusion Developer Center. This is what I got on Twitter:

ydn @cfjedimaster we didn't get a renewed interest in #coldfusion from our community and we've been trying to streamline our site/offerings

I protested this (well, on Twitter) but have yet to get a response. I also posted my query to their forums. I'd really appreciate it folks ping them - either via Twitter or their forums, and let them know that you want to see ColdFusion represented. Crap - the content was written. They didn't have to pay anything for it (and I'd gladly keep it updated) so this should be a no brainer for them. Shoot - they have a Silverlight developer center. You can't tell me there are more SL devs then ColdFusion.

Download attached file.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Ryan LeTulle posted on 9/11/2009 at 2:34 AM

Very neat stuff. I'll look into this with more detail soon.

Silverlight doesn't surprise me now especially with their Microsoft relationship.

Comment 2 by Francois Levesque posted on 9/11/2009 at 3:10 AM

That's slick. I had no idea that YQL offered that type of functionality. I'll have to take a closer look into that.

Nice find, Ray!

Comment 3 by Dan G. Switzer, II posted on 9/11/2009 at 3:37 AM

The problem is, no if they haven't been getting the traffic, then I doubt no matter how much protesting you do it's not going to matter.

Comment 4 by Todd Rafferty posted on 9/11/2009 at 5:26 AM

Ray, I tried this on Railo - kept getting a 400 error as a response from Yahoo. Blair McKenzie on the Railo list suggested I change htmlEditFormat() to urlencodedformat(). It then started working.

Comment 5 by thinman posted on 9/11/2009 at 5:27 AM

I know! I've been looking all over for the CF section over at YDN. Bastards! It was probably one of the terms of the deal with Bing. [sigh]

It looks like Pipes may also go away too, they're sure sounding like they're looking for stuff to get rid of that would give MS competition.

Comment 6 by Raymond Camden posted on 9/11/2009 at 5:48 AM

@Todd: Wow, the htmlEditFormat was a total brain fart. It should definitely be urlEncodedFormat. Will fix in 2 minutes and replace the zip.

@Thinman: Worse comes to worse, I'll post my copy of the files.

Comment 7 by Raymond Camden posted on 9/11/2009 at 5:50 AM

Zip fixed. Thanks again Todd.

Comment 8 by Jonathan posted on 9/11/2009 at 5:51 AM

The YQL and Pipes products are developed by the same team and are not related to the search teams/products. We're very close to a big update to Pipes which builds upon the new YQL technology - Im glad you're enjoying both!

Comment 9 by felix tjandrawibawa posted on 9/11/2009 at 6:10 AM

I really can't believe this YQL hasn't taken the world by storm :)
It's really awesome!
I played around w/ it few months ago:
http://tech.tjandrawibawa.o...

Comment 10 by Luke posted on 9/11/2009 at 12:32 PM

hi Ray, fantastic - thanks very much for taking the time to do a YQL post :-) it's definitely the kinda thing that will come in useful for folks at somepoint.

as for YDN - *grrrrr*, shall ping.

Comment 11 by Rudi posted on 9/11/2009 at 2:11 PM

Ray,

Good Stuff. I've been using the pipes feed from Yahoo for a while and have only scratched the surface with what is possible. I would very interested in the cf_yahoo package.

-Rudi

Comment 12 by Todd Rafferty posted on 9/11/2009 at 2:22 PM

Per Ray's request to drop a link here. I followed up Ray's post with how to create a Custom Tag CFC (only available in Railo) and how you can actually make this a built-in-tag in Railo. By built in, I mean: <cfyql>

http://www.railo.ch/blog/in...

Comment 13 by Raymond Camden posted on 9/11/2009 at 3:53 PM

@Rudi: Not sure what you mean by your last statement. There _is_ a CFYAhoo package at RIAForge. It covers many of Yahoo's APIs. Are you saying it makes sense to put this _in_ there?

Comment 14 by Lola LB posted on 9/11/2009 at 6:24 PM

YQL looks very interesting . . .

Shame they removed the CF Developer content. I've left a comment on their forum.

Comment 15 by Rudi posted on 9/11/2009 at 9:15 PM

@ray

Yes, that's what I meant. That's what I get for posting before the caffeine kicks in.

Comment 16 by Ben Nadel posted on 9/16/2009 at 5:03 PM

Ray, really awesome post. That's way cool that you can use YQL to parse web pages. Awesomeness.

Comment 17 by Raymond Camden posted on 9/16/2009 at 8:10 PM

Just an FYI. Spoke to YDN, and the ColdFusion Dev Center WILL be returning!!

Comment 18 by Ben Nadel posted on 9/16/2009 at 8:11 PM

@Ray,

Are you so badass that you actually got them to change their mind :D

Comment 19 by Raymond Camden posted on 9/16/2009 at 8:17 PM

Heh, no, I think it was all the people who posted/complained on the forums.

The ColdFusion Army is strong!

Comment 20 by Ben Nadel posted on 9/16/2009 at 8:18 PM

Sure sure... you can deflect the greatness all you want, but I think we know what's going on here :)

Comment 21 by cf_Ray posted on 9/24/2009 at 2:31 PM

How did Ray get Yahoo to continue the CF section? ColdFusionJedi mind trick!! (someone had to say it!)

Awesome post MrCamden!

Comment 22 by Raymond Camden posted on 9/24/2009 at 3:47 PM

I whined. ;) Nah, not really. I hooked up with the person in charge, and with Adam Lehman, and we discussed the center and what we could do to ensure it stays up to date. (That was Yahoo's primary concern.) Now that it is back, Adam and I need to come up with a plan to help update it.

Comment 23 by TomasF posted on 9/28/2009 at 3:27 PM

Thanks Ray, this is just excellent (both YQL and cf_yql), just what I needed for a project I'm playing with. I do notice it seems to have some problems with a little more complex data, like the result of select * from flickr.photos.exif where photo_id='3889005905' which returns:
Element 1 is undefined in a CFML structure referenced as part of an expression.
47 : <cfif isSimpleValue(data.query.results[key][1])>

Trying to figure it out, but I'll probably never get there, I'm a sysadmin and shouldn't be doing this in the first place ;)

Comment 24 by Raymond Camden posted on 9/29/2009 at 9:41 PM

I'm very unsure about this fix - it works, but will probably need to be redone.

Grab the bits here: http://www.coldfusionjedi.c...

Comment 25 by Raymond Camden posted on 9/29/2009 at 9:43 PM

Scratch that. It breaks other Flickr searches.

Comment 26 by Raymond Camden posted on 9/29/2009 at 9:45 PM

This one should be better:

http://www.coldfusionjedi.c...

Comment 27 by Misty posted on 9/30/2009 at 5:03 AM

Hi ray! adding it to yahoo Package will be nice to play around this tag properly. Have you looked at their Yahoo BOSS! it is quite cool too.

Comment 28 by William Greenly posted on 10/2/2009 at 4:07 PM

I take it this is dependent on anticipating well formed xml at the end of the url?

Another alternative is a to provide an xQuery parser e.g

<cf_XQuery name="results">
for $i in document("http://www.dcs.gla.ac.uk/~j...")
return $i//ul/li/p
</CF_XQuery>

This provides more expressiveness than SQL and xpath combined.

I have developed a custom tag to provide this functionality at:

http://www.cfxquery.co.uk/C...

Many Thanks,

Will

Comment 29 by Raymond Camden posted on 10/2/2009 at 5:23 PM

No - the YQL isn't XML at all. Where did you get that?

Comment 30 by Joey Krabacher posted on 9/10/2011 at 8:41 AM

Found a bug when trying to run a query that grabs one record and returns the results as a struct instead of an array.

query : SELECT * FROM upcoming.events WHERE event_id="8452132"

fix(added this to the cfif block where the output query is created) :
<code>
<cfelseif totalResults eq 1 and isStruct(data.query.results[key])>
<cfset query = queryNew(structKeyList(data.query.results[key]))>
<cfset queryAddRow(query)>
<cfloop index="col" list="#query.columnList#">
<cfset querySetCell(query, col, data.query.results[key][col])>
</cfloop>
</code>

Comment 31 by Raymond Camden posted on 9/10/2011 at 4:41 PM

Thanks Joey.

Comment 32 by Chris posted on 9/3/2014 at 6:49 PM

I realize this thread is 3 years old, but is there still a place I can get the version 3 yql3.zip?

Comment 33 by Raymond Camden posted on 9/3/2014 at 7:02 PM

I think I posted it up on RIAForge.