As you know (or hopefully know), ColdFusion 8 supports both RSS parsing and creating. I was curious how difficult it would be to create an RSS aggregator in ColdFusion. Turns out it was rather easy - and with the use of CFTHREAD it actually performs quite here. I'll show the code then talk about the parts of it.
<cffunction name="aggregate" returnType="query" output="false">
<cfargument name="feeds" type="string" required="true" hint="List of RSS urls.">
<cfset var results = structNew()>
<cfset var result = "">
<cfset var entries = "">
<cfset var x = "">
<cfset var totalentries = "">
<!--- Use this column list since not all feeds return the same cols. --->
<cfset var collist = "authoremail ,authorname ,authoruri ,categorylabel ,categoryscheme ,categoryterm ,comments ,content ,contentmode ,contentsrc ,contenttype ,contributoremail ,contributorname ,contributoruri ,createddate ,expirationdate ,id ,idpermalink ,linkhref ,linkhreflang ,linklength ,linkrel ,linktitle ,publisheddate ,rights ,rsslink ,source ,sourceurl ,summary ,summarymode ,summarysrc ,summarytype ,title ,updateddate ,uri ,xmlbase">
<cfset var tlist = "">
<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
<cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
<cffeed source="#attributes.url#" query="thread.entries">
</cfthread>
<cfset tlist = listAppend(tlist, "thread_#x#")>
</cfloop>
<cfthread action="join" name="#tlist#" />
<!--- copy out just for ease of use --->
<cfloop index="x" list="#tlist#">
<cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
</cfloop>
<cfquery name="totalentries" dbtype="query">
<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
select
#collist#
from results.result_#x#
<cfif x is not listLen(arguments.feeds)>
union
</cfif>
</cfloop>
</cfquery>
<!--- sort --->
<cfquery name="totalentries" dbtype="query">
select #collist#
from totalentries
order by publisheddate desc
</cfquery>
<cfreturn totalentries>
</cffunction>
So first off - the point of the aggregator is to take a list of RSS feeds and return one simple query. The UDF takes one argument - feeds.
I loop through each of the feeds and while inside of a thread, I download and create a query for the feed:
<cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
<cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
<cffeed source="#attributes.url#" query="thread.entries">
</cfthread>
<cfset tlist = listAppend(tlist, "thread_#x#")>
</cfloop>
Next I join the threads together. This makes my function wait till all the threads are done:
<cfthread action="join" name="#tlist#" />
Now for the fun part. I need to get the data out of the threads. I named each thread "thread_#x#" where x was a number. I stored the list of thread names in a variable called tlist. So I can loop over each of them and use evaluate to fetch the thread. I stored the data in a variable named entries, so this is the code I ended up with:
<!--- copy out just for ease of use --->
<cfloop index="x" list="#tlist#">
<cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
</cfloop>
In case you are wondering about the var scope - let me just say this. Adobe has done... well... "magic", to ensure that we do not have to var scope threaded data like I've used above. Don't ask me why - but I've been assured the code above is safe even in a multiple request format.
So now I have a results structure that contains a bunch of queries. I can then use query of query to join them all together.
You may wonder - why the select #collist# instead of select *? Some feeds may contain more columns then other feeds, specifically feeds containing Dublin Core or ITunes extensions. So I created a "core" list of columns I can depend on.
Lastly - I sort the joined query by the published date value. This will give me one final query that contains a sorted list of blog entries. Now for a quick demo:
<cfset feeds = "http://feeds.dzone.com/dzone/frontpage?abc76aaAw7kar7IKy69lr, http://www.riaforge.org/index.cfm?event=page.rss, http://feeds.feedburner.com/RaymondCamdensColdfusionBlog">
<cfset aggregation = aggregate(feeds)>
<cfoutput query="aggregation">
<cftooltip tooltip="#content#">
<a href="#rsslink#">#title#</a><cfif
len(publisheddate)>(#dateFormat(publisheddate)#)</cfif><br />
</cftooltip>
</cfoutput>
I used three feeds - DZone, RIAForge, and my own blog. I then loop over the results and use a bit of AJAX-UI candy (cftooltip) to display the results. And guess what?? I finally have a live demo for you!
Tomorrow I'll show an alternative to this that lets you search RSS feeds. (And yes, I do plan on resurrecting RSSWatcher.com with CF8 code. I promise.)
p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code.
EDIT: Folks, due to IE being a sucky browser and all, there was a huge rendering issue in the code. I added spaces to all of the columns in the "collist" variable. If you cut and paste the code, you will need to trim the values. Also note I have a new version which fixes some bugs. This will be posted around lunch time.
Archived Comments
"No external libraries allowed - it must be all "baked in" code."
Personally, I'd prefer that CF had a "plugins" directory and API where PDF support or reporting or threading tag/function modules could be dropped into the system--or removed or replaced--as needed.
And as opposed to CF turning into a PHP analogue with "a half million functions served". And counting.
There are advantages to NOT having a bloated core with everything "baked in" as well.
coldfusionjedi.com - should have guessed but surprised that wasn't taken already.
Slightly OT, curious to see if when you move your blog there if MXNA picks up your olds posts again. Got any tips on that or just live with it?
Please keep the cf8 tips coming, tides us over until the new cfwack comes out - mid-year right?
Very cool. One suggestion. Make the feeds argument to the UDF an array instead of a list. If the URL has a comma in it, the UDF will break.
[quote]
p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code."
[/quote]
cf 8 is not even out and already feeling insecure :) or getting too ahead of urself :)
try this -- http://magpierss.sourceforg...
and whats with "it must be all baked in code" ... take a look at the lib folder in the CF install directory and see if you can count the # of commons-*.jar files there.
you are way above all this!
php_dude: I want to argue soooo bad... but I won't. It was a dumb, flippant comment, and not even on topic.
michael: CF does have a 'plugin' architecture - custom tags, CFXs, and CFCs. Folks _have_ done PDF libraries. So it's there. Personally I'd rather have i tin.
scott p: I hope it doesn't reseed, but it may. I'm thinking it will only reseed my current RSS feed though. I'm still waiting to move the blog though. I got it upgraded to 5.8.00whatever the heck was last, but I can't get the damn SQL Server migrated.
sean: Excellent point. When I post this to CFLib, I'll update it, and tomorrow's "search" post will use an array.
[quote]
php_dude: I want to argue soooo bad... but I won't. It was a dumb, flippant comment, and not even on topic.
[/quote]
ray,
i didn't mean to piss you off (u sound like i did :)). my point was that as a prominent person of the cf community you should be cautious at rubbing it in other people's faces how good the new version (or past/existing version) of cf is...
IMHO, CF, PHP or any other development its not without its faults... but comments like these dont help propogate the correct message.
CF is not bloated, and CF8 is even better, here's why:
You can run CF code, .NET, Java, JSP, PHP, and Ruby all in the same script.
You should check out CF, you will never look back!
@php_dude - Nope, not at all.
Not to be pretentious, but you're still a few steps away from having true "rss aggregator" functionality accomplished. A good aggregator will remember which posts you've already read. There are 100 ways to skin that cat, though.
From my post on the subject: "I know about CFXs (ColdFusion Extensions). But they have to be registered, can't just be dropped in, have a limited API and are basically second-class citizens. Plus there's no good way to create, say, an image tag library with a integral set of tags and functions (cfimage, isImage()) that seemlessly extends the language and environment."
http://www.cfinternals.org/...
BTW, if you reread my comment regarding PDF, you'll note I said, "modules could be dropped into the system--or removed or replaced--as needed".
Yes, someone can create a XYZ CFX, but can they remove XYZ support if it's not wanted? Can I swap out the baked-in XYZ support and replace the tags with a better implementation?
Silly question: You're attempting to join the threads before you know they're completely finished? What happens when you prematurely join things together?
Using threads to read and parse the RSS feeds in parallel is a great idea. I'll probably incorporate that into the RSS aggregator portlet we have in our portal system.
I wonder, though, what cffeed does when it tries to retrieve a feed that isn't currently available or parses one with malformed XML? I had to manually code a number of safety checks in our portlet to try and handle those situations gracefully.
I'm hoping to have time this weekend to starting messing around with the beta.
I created an RSS Aggregator in CF7 using your RSS.cfc (which I now presume will be obselete *sob*)
All I did was loop in the URLS of XML feeds into RSS.cfc and joined the Queries at the end much like this - however, I ended up writing the output to a static file and then scheduling the cfc to run once an hour: that way at least visitors to the site won't be waiting on 10 sites RSS feeds.
The multi threaded approach is interesting though !
@Todd - no - the point of the join action is "Join these guys together when they are done." It basically means - wait here till this list is done.
@Brian - my UDF fails on that. I've fixed that, and another bug, and will post a new version to blog after I eat some eggs. Get it - aggregator - eggs - seriously though - I need breakfast.
@Tom - Certainly not. I will continue to work on it - when time permits - or as folks find bugs. We will have folks on MX6 and 7 for a long time. I'm not going to abandon them.
Umm . . . got a .zip file for this?
And, any way to size the tooltip popup window? I find that if there is a lot of text in one of the popup window, the right end of the window gets chopped off by about 3 or 4 pixels on the left edge, when I'm viewing the demo page in Safari. Also happens in FF, but doesn't seem to be as bad.
Lola, lets wait for a zip until I post the new version today.
I should have put a size limit on the tooltip text. That's my fault. As for sizing it - I know you can edit the CSS for it (the reference talks about this), but I haven't tried it myself.
well if you guys are looking for a good web based RSS aggregator i use itsmynews, it has a very simple layout and tons of feed. I think you should all go a check it out, http://www.itsmynews.com
Is there a functional version for CF7 around somewhere? This is exactly what we need for a project I'm working on.
I have my RSS.cfc that you can use. It doesn't include threading, but it lets you parse RSS feeds into CF queries. You can use it for the low price of 999.999 per bye translated. Or Free. ;)
How do you return content:encoded node? It errors when I add it to the list. Please help.
Can you please describe your problem more?
Michael,
I know this post is a little old, but I discovered the answer to your question, and thought I would post it here. You need to access it using the element name as the identifier. For example:
XML.rss.channel.item[i]["content:encoded"].XmlText
Hi Ray i have used your scrips as is showed, but send my a error "Element ENTRIES is undefined in a Java object of type class coldfusion.thread.ThreadScope. "
i try to use it for parsing some different newspapers rss and show all togheter ordinate by time, your code seems to be perfect and also unique for reading source from a query.
Can tell to a complete rss-newbie where im wrong?
Partial correction: now work, until i think one of the feed are not present, or answer with no data found, so error comes...
it possible to scan only feed that work?
Well, you won't know it doesn't work until you try to scan it. ;) You may want to look at Paragator (http://paragator.riaforge.org) It is the newest form of this code. This code here is rather old (2+ years).
tnx Ray, go to see and try it.
You are a lighthouse of my CF apps like Ben that i know, if you come in Italy for a cfday in future don't forget you have a great pizza payed.
You are most welcome.
Hi Ray, have try to use paragator and work very fine, nice job, really. Let me ask why when parse more than one feed, their comes ordered by url and not by date as is determined in the last QdQ, so comes 7/8 feed of same url also many days old, and after another url with tomorrow news. Its pssible and where limit date only in past 24 hours, and or mix feed by date ?
this is sample of the test page that use paragator : http://www.informagorizia.i... .
I think you must sign on notes another pizza with great wine this time, next cfday in italy hope you come with ben
Not sure why. If you dump the query and it shows that it is order by [date], then it should be right. I ran the test from the download and it's working right.