June 5, 2007 (This post is more than 2 years old.)

ColdFusion 8: RSS Aggregator UDF

coldfusion

As you know (or hopefully know), ColdFusion 8 supports both RSS parsing and creating. I was curious how difficult it would be to create an RSS aggregator in ColdFusion. Turns out it was rather easy - and with the use of CFTHREAD it actually performs quite here. I'll show the code then talk about the parts of it.


<cffunction name="aggregate" returnType="query" output="false">
	<cfargument name="feeds" type="string" required="true" hint="List of RSS urls.">
	<cfset var results = structNew()>
	<cfset var result = "">
	<cfset var entries = "">
	<cfset var x = "">
	<cfset var totalentries = "">
	<!--- Use this column list since not all feeds return the same cols. --->
	<cfset var collist = "authoremail ,authorname ,authoruri ,categorylabel ,categoryscheme ,categoryterm ,comments ,content ,contentmode ,contentsrc ,contenttype ,contributoremail ,contributorname ,contributoruri ,createddate ,expirationdate ,id ,idpermalink ,linkhref ,linkhreflang ,linklength ,linkrel ,linktitle ,publisheddate ,rights ,rsslink ,source ,sourceurl ,summary ,summarymode ,summarysrc ,summarytype ,title ,updateddate ,uri ,xmlbase">
	<cfset var tlist = "">
<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
<cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
<cffeed source="#attributes.url#" query="thread.entries">
</cfthread>
<cfset tlist = listAppend(tlist, "thread_#x#")>
</cfloop>
<cfthread action="join" name="#tlist#" />
<!--- copy out just for ease of use --->
<cfloop index="x" list="#tlist#">
<cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
</cfloop>
<cfquery name="totalentries" dbtype="query">
<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
select
#collist#
from results.result_#x#
<cfif x is not listLen(arguments.feeds)>
union
</cfif>
</cfloop>
</cfquery>
<!--- sort --->
<cfquery name="totalentries" dbtype="query">
select          #collist#
from            totalentries
order by        publisheddate desc
</cfquery>

<cfreturn totalentries> </cffunction>

So first off - the point of the aggregator is to take a list of RSS feeds and return one simple query. The UDF takes one argument - feeds.

I loop through each of the feeds and while inside of a thread, I download and create a query for the feed:


<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
	<cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
		<cffeed source="#attributes.url#" query="thread.entries">
	</cfthread>
	<cfset tlist = listAppend(tlist, "thread_#x#")>
</cfloop>

Next I join the threads together. This makes my function wait till all the threads are done:


<cfthread action="join" name="#tlist#" />

Now for the fun part. I need to get the data out of the threads. I named each thread "thread_#x#" where x was a number. I stored the list of thread names in a variable called tlist. So I can loop over each of them and use evaluate to fetch the thread. I stored the data in a variable named entries, so this is the code I ended up with:


<!--- copy out just for ease of use --->
<cfloop index="x" list="#tlist#">
	<cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
</cfloop>

In case you are wondering about the var scope - let me just say this. Adobe has done... well... "magic", to ensure that we do not have to var scope threaded data like I've used above. Don't ask me why - but I've been assured the code above is safe even in a multiple request format.

So now I have a results structure that contains a bunch of queries. I can then use query of query to join them all together.

You may wonder - why the select #collist# instead of select *? Some feeds may contain more columns then other feeds, specifically feeds containing Dublin Core or ITunes extensions. So I created a "core" list of columns I can depend on.

Lastly - I sort the joined query by the published date value. This will give me one final query that contains a sorted list of blog entries. Now for a quick demo:


<cfset feeds = "http://feeds.dzone.com/dzone/frontpage?abc76aaAw7kar7IKy69lr, http://www.riaforge.org/index.cfm?event=page.rss, http://feeds.feedburner.com/RaymondCamdensColdfusionBlog">
<cfset aggregation = aggregate(feeds)>

<cfoutput query="aggregation"> <cftooltip tooltip="#content#"> <a href="#rsslink#">#title#</a><cfif len(publisheddate)>(#dateFormat(publisheddate)#)</cfif><br /> </cftooltip> </cfoutput>

I used three feeds - DZone, RIAForge, and my own blog. I then loop over the results and use a bit of AJAX-UI candy (cftooltip) to display the results. And guess what?? I finally have a live demo for you!

Tomorrow I'll show an alternative to this that lets you search RSS feeds. (And yes, I do plan on resurrecting RSSWatcher.com with CF8 code. I promise.)

p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code.

EDIT: Folks, due to IE being a sucky browser and all, there was a huge rendering issue in the code. I added spaces to all of the columns in the "collist" variable. If you cut and paste the code, you will need to trim the values. Also note I have a new version which fixes some bugs. This will be posted around lunch time.

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Michael Long posted on 6/6/2007 at 2:50 AM

"No external libraries allowed - it must be all "baked in" code."

Personally, I'd prefer that CF had a "plugins" directory and API where PDF support or reporting or threading tag/function modules could be dropped into the system--or removed or replaced--as needed.

And as opposed to CF turning into a PHP analogue with "a half million functions served". And counting.

There are advantages to NOT having a bloated core with everything "baked in" as well.

Comment 2 by Scott P posted on 6/6/2007 at 2:55 AM

coldfusionjedi.com - should have guessed but surprised that wasn't taken already.

Slightly OT, curious to see if when you move your blog there if MXNA picks up your olds posts again. Got any tips on that or just live with it?

Please keep the cf8 tips coming, tides us over until the new cfwack comes out - mid-year right?

Comment 3 by Sean posted on 6/6/2007 at 4:11 AM

Very cool. One suggestion. Make the feeds argument to the UDF an array instead of a list. If the URL has a comma in it, the UDF will break.

Comment 4 by php_dude posted on 6/6/2007 at 4:11 AM

[quote]
p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code."
[/quote]

cf 8 is not even out and already feeling insecure :) or getting too ahead of urself :)

try this -- http://magpierss.sourceforg...

and whats with "it must be all baked in code" ... take a look at the lib folder in the CF install directory and see if you can count the # of commons-*.jar files there.

you are way above all this!

Comment 5 by Raymond Camden posted on 6/6/2007 at 5:12 AM

php_dude: I want to argue soooo bad... but I won't. It was a dumb, flippant comment, and not even on topic.

michael: CF does have a 'plugin' architecture - custom tags, CFXs, and CFCs. Folks _have_ done PDF libraries. So it's there. Personally I'd rather have i tin.

scott p: I hope it doesn't reseed, but it may. I'm thinking it will only reseed my current RSS feed though. I'm still waiting to move the blog though. I got it upgraded to 5.8.00whatever the heck was last, but I can't get the damn SQL Server migrated.

sean: Excellent point. When I post this to CFLib, I'll update it, and tomorrow's "search" post will use an array.

Comment 6 by php_dude posted on 6/6/2007 at 6:40 AM

[quote]
php_dude: I want to argue soooo bad... but I won't. It was a dumb, flippant comment, and not even on topic.
[/quote]

ray,

i didn't mean to piss you off (u sound like i did :)). my point was that as a prominent person of the cf community you should be cautious at rubbing it in other people's faces how good the new version (or past/existing version) of cf is...

IMHO, CF, PHP or any other development its not without its faults... but comments like these dont help propogate the correct message.

Comment 7 by 7079 posted on 6/6/2007 at 6:47 AM

CF is not bloated, and CF8 is even better, here's why:

You can run CF code, .NET, Java, JSP, PHP, and Ruby all in the same script.

You should check out CF, you will never look back!

Comment 8 by Raymond Camden posted on 6/6/2007 at 6:49 AM

@php_dude - Nope, not at all.

Comment 9 by Adam posted on 6/6/2007 at 6:53 AM

Not to be pretentious, but you're still a few steps away from having true "rss aggregator" functionality accomplished. A good aggregator will remember which posts you've already read. There are 100 ways to skin that cat, though.

Comment 10 by Michael Long posted on 6/6/2007 at 10:10 AM

From my post on the subject: "I know about CFXs (ColdFusion Extensions). But they have to be registered, can't just be dropped in, have a limited API and are basically second-class citizens. Plus there's no good way to create, say, an image tag library with a integral set of tags and functions (cfimage, isImage()) that seemlessly extends the language and environment."

http://www.cfinternals.org/...

Comment 11 by Michael Long posted on 6/6/2007 at 10:16 AM

BTW, if you reread my comment regarding PDF, you'll note I said, "modules could be dropped into the system--or removed or replaced--as needed".

Yes, someone can create a XYZ CFX, but can they remove XYZ support if it's not wanted? Can I swap out the baked-in XYZ support and replace the tags with a better implementation?

Comment 12 by Todd posted on 6/6/2007 at 4:06 PM

Silly question: You're attempting to join the threads before you know they're completely finished? What happens when you prematurely join things together?

Comment 13 by Brian Swartzfager posted on 6/6/2007 at 4:06 PM

Using threads to read and parse the RSS feeds in parallel is a great idea. I'll probably incorporate that into the RSS aggregator portlet we have in our portal system.

I wonder, though, what cffeed does when it tries to retrieve a feed that isn't currently available or parses one with malformed XML? I had to manually code a number of safety checks in our portlet to try and handle those situations gracefully.

I'm hoping to have time this weekend to starting messing around with the beta.

Comment 14 by Tom K posted on 6/6/2007 at 4:17 PM

I created an RSS Aggregator in CF7 using your RSS.cfc (which I now presume will be obselete *sob*)

All I did was loop in the URLS of XML feeds into RSS.cfc and joined the Queries at the end much like this - however, I ended up writing the output to a static file and then scheduling the cfc to run once an hour: that way at least visitors to the site won't be waiting on 10 sites RSS feeds.

The multi threaded approach is interesting though !

Comment 15 by Raymond Camden posted on 6/6/2007 at 5:04 PM

@Todd - no - the point of the join action is "Join these guys together when they are done." It basically means - wait here till this list is done.

@Brian - my UDF fails on that. I've fixed that, and another bug, and will post a new version to blog after I eat some eggs. Get it - aggregator - eggs - seriously though - I need breakfast.

@Tom - Certainly not. I will continue to work on it - when time permits - or as folks find bugs. We will have folks on MX6 and 7 for a long time. I'm not going to abandon them.

Comment 16 by Lola LB posted on 6/6/2007 at 6:25 PM

Umm . . . got a .zip file for this?

And, any way to size the tooltip popup window? I find that if there is a lot of text in one of the popup window, the right end of the window gets chopped off by about 3 or 4 pixels on the left edge, when I'm viewing the demo page in Safari. Also happens in FF, but doesn't seem to be as bad.

Comment 17 by Raymond Camden posted on 6/6/2007 at 6:30 PM

Lola, lets wait for a zip until I post the new version today.

I should have put a size limit on the tooltip text. That's my fault. As for sizing it - I know you can edit the CSS for it (the reference talks about this), but I haven't tried it myself.

Comment 18 by peter posted on 6/6/2007 at 11:41 PM

well if you guys are looking for a good web based RSS aggregator i use itsmynews, it has a very simple layout and tons of feed. I think you should all go a check it out, http://www.itsmynews.com

Comment 19 by Ian posted on 6/7/2007 at 6:48 AM

Is there a functional version for CF7 around somewhere? This is exactly what we need for a project I'm working on.

Comment 20 by Raymond Camden posted on 6/7/2007 at 7:00 AM

I have my RSS.cfc that you can use. It doesn't include threading, but it lets you parse RSS feeds into CF queries. You can use it for the low price of 999.999 per bye translated. Or Free. ;)

Comment 21 by Michael Sandoz posted on 10/2/2007 at 2:53 AM

How do you return content:encoded node? It errors when I add it to the list. Please help.

Comment 22 by Raymond Camden posted on 10/4/2007 at 2:28 AM

Can you please describe your problem more?

Comment 23 by Paul Day posted on 12/19/2008 at 1:14 AM

Michael,

I know this post is a little old, but I discovered the answer to your question, and thought I would post it here. You need to access it using the element name as the identifier. For example:

XML.rss.channel.item[i]["content:encoded"].XmlText

Comment 24 by lucaspedo posted on 10/12/2009 at 7:18 PM

Hi Ray i have used your scrips as is showed, but send my a error "Element ENTRIES is undefined in a Java object of type class coldfusion.thread.ThreadScope. "
i try to use it for parsing some different newspapers rss and show all togheter ordinate by time, your code seems to be perfect and also unique for reading source from a query.
Can tell to a complete rss-newbie where im wrong?

Comment 25 by lucaspedo posted on 10/12/2009 at 7:39 PM

Partial correction: now work, until i think one of the feed are not present, or answer with no data found, so error comes...
it possible to scan only feed that work?

Comment 26 by Raymond Camden posted on 10/12/2009 at 7:41 PM

Well, you won't know it doesn't work until you try to scan it. ;) You may want to look at Paragator (http://paragator.riaforge.org) It is the newest form of this code. This code here is rather old (2+ years).

Comment 27 by lucaspedo posted on 10/13/2009 at 2:34 PM

tnx Ray, go to see and try it.
You are a lighthouse of my CF apps like Ben that i know, if you come in Italy for a cfday in future don't forget you have a great pizza payed.

Comment 28 by Raymond Camden posted on 10/13/2009 at 4:02 PM

You are most welcome.

Comment 29 by lucaspedo posted on 10/14/2009 at 11:30 AM

Hi Ray, have try to use paragator and work very fine, nice job, really. Let me ask why when parse more than one feed, their comes ordered by url and not by date as is determined in the last QdQ, so comes 7/8 feed of same url also many days old, and after another url with tomorrow news. Its pssible and where limit date only in past 24 hours, and or mix feed by date ?
this is sample of the test page that use paragator : http://www.informagorizia.i... .
I think you must sign on notes another pizza with great wine this time, next cfday in italy hope you come with ben

Comment 30 by Raymond Camden posted on 10/15/2009 at 6:14 PM

Not sure why. If you dump the query and it shows that it is order by [date], then it should be right. I ran the test from the download and it's working right.

Support this Content!

Archived Comments

Webmentions