Posted in ColdFusion | Posted on 06-05-2007 | 8,225 views
As you know (or hopefully know), ColdFusion 8 supports both RSS parsing and creating. I was curious how difficult it would be to create an RSS aggregator in ColdFusion. Turns out it was rather easy - and with the use of CFTHREAD it actually performs quite here. I'll show the code then talk about the parts of it.
2 <cfargument name="feeds" type="string" required="true" hint="List of RSS urls.">
3 <cfset var results = structNew()>
4 <cfset var result = "">
5 <cfset var entries = "">
6 <cfset var x = "">
7 <cfset var totalentries = "">
8 <!--- Use this column list since not all feeds return the same cols. --->
9 <cfset var collist = "authoremail ,authorname ,authoruri ,categorylabel ,categoryscheme ,categoryterm ,comments ,content ,contentmode ,contentsrc ,contenttype ,contributoremail ,contributorname ,contributoruri ,createddate ,expirationdate ,id ,idpermalink ,linkhref ,linkhreflang ,linklength ,linkrel ,linktitle ,publisheddate ,rights ,rsslink ,source ,sourceurl ,summary ,summarymode ,summarysrc ,summarytype ,title ,updateddate ,uri ,xmlbase">
10 <cfset var tlist = "">
11
12 <cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
13 <cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
14 <cffeed source="#attributes.url#" query="thread.entries">
15 </cfthread>
16 <cfset tlist = listAppend(tlist, "thread_#x#")>
17 </cfloop>
18
19 <cfthread action="join" name="#tlist#" />
20
21 <!--- copy out just for ease of use --->
22 <cfloop index="x" list="#tlist#">
23 <cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
24 </cfloop>
25
26 <cfquery name="totalentries" dbtype="query">
27 <cfloop index="x" from="1" to="#listLen(arguments.feeds)#">
28 select
29 #collist#
30 from results.result_#x#
31 <cfif x is not listLen(arguments.feeds)>
32 union
33 </cfif>
34 </cfloop>
35 </cfquery>
36
37 <!--- sort --->
38 <cfquery name="totalentries" dbtype="query">
39 select #collist#
40 from totalentries
41 order by publisheddate desc
42 </cfquery>
43
44 <cfreturn totalentries>
45</cffunction>
So first off - the point of the aggregator is to take a list of RSS feeds and return one simple query. The UDF takes one argument - feeds.
I loop through each of the feeds and while inside of a thread, I download and create a query for the feed:
2 <cfthread action="run" name="thread_#x#" url="#listGetAt(arguments.feeds,x)#">
3 <cffeed source="#attributes.url#" query="thread.entries">
4 </cfthread>
5 <cfset tlist = listAppend(tlist, "thread_#x#")>
6</cfloop>
Next I join the threads together. This makes my function wait till all the threads are done:
Now for the fun part. I need to get the data out of the threads. I named each thread "thread_#x#" where x was a number. I stored the list of thread names in a variable called tlist. So I can loop over each of them and use evaluate to fetch the thread. I stored the data in a variable named entries, so this is the code I ended up with:
2<cfloop index="x" list="#tlist#">
3 <cfset results["result_#replaceNoCase(x,'thread_','')#"] = evaluate("#x#").entries>
4</cfloop>
In case you are wondering about the var scope - let me just say this. Adobe has done... well... "magic", to ensure that we do not have to var scope threaded data like I've used above. Don't ask me why - but I've been assured the code above is safe even in a multiple request format.
So now I have a results structure that contains a bunch of queries. I can then use query of query to join them all together.
You may wonder - why the select #collist# instead of select *? Some feeds may contain more columns then other feeds, specifically feeds containing Dublin Core or ITunes extensions. So I created a "core" list of columns I can depend on.
Lastly - I sort the joined query by the published date value. This will give me one final query that contains a sorted list of blog entries. Now for a quick demo:
2<cfset aggregation = aggregate(feeds)>
3
4<cfoutput query="aggregation">
5<cftooltip tooltip="#content#">
6<a href="#rsslink#">#title#</a><cfif
7len(publisheddate)>(#dateFormat(publisheddate)#)</cfif><br />
8</cftooltip>
9</cfoutput>
I used three feeds - DZone, RIAForge, and my own blog. I then loop over the results and use a bit of AJAX-UI candy (cftooltip) to display the results. And guess what?? I finally have a live demo for you!
Tomorrow I'll show an alternative to this that lets you search RSS feeds. (And yes, I do plan on resurrecting RSSWatcher.com with CF8 code. I promise.)
p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code.
EDIT: Folks, due to IE being a sucky browser and all, there was a huge rendering issue in the code. I added spaces to all of the columns in the "collist" variable. If you cut and paste the code, you will need to trim the values. Also note I have a new version which fixes some bugs. This will be posted around lunch time.


Personally, I'd prefer that CF had a "plugins" directory and API where PDF support or reporting or threading tag/function modules could be dropped into the system--or removed or replaced--as needed.
And as opposed to CF turning into a PHP analogue with "a half million functions served". And counting.
There are advantages to NOT having a bloated core with everything "baked in" as well.
Slightly OT, curious to see if when you move your blog there if MXNA picks up your olds posts again. Got any tips on that or just live with it?
Please keep the cf8 tips coming, tides us over until the new cfwack comes out - mid-year right?
p.s. Ok, not that I want to start a flame war - but I'd love to see the same code written in PHP. No external libraries allowed - it must be all "baked in" code."
[/quote]
cf 8 is not even out and already feeling insecure :) or getting too ahead of urself :)
try this -- http://magpierss.sourceforge.net/#features
and whats with "it must be all baked in code" ... take a look at the lib folder in the CF install directory and see if you can count the # of commons-*.jar files there.
you are way above all this!
michael: CF does have a 'plugin' architecture - custom tags, CFXs, and CFCs. Folks _have_ done PDF libraries. So it's there. Personally I'd rather have i tin.
scott p: I hope it doesn't reseed, but it may. I'm thinking it will only reseed my current RSS feed though. I'm still waiting to move the blog though. I got it upgraded to 5.8.00whatever the heck was last, but I can't get the damn SQL Server migrated.
sean: Excellent point. When I post this to CFLib, I'll update it, and tomorrow's "search" post will use an array.
php_dude: I want to argue soooo bad... but I won't. It was a dumb, flippant comment, and not even on topic.
[/quote]
ray,
i didn't mean to piss you off (u sound like i did :)). my point was that as a prominent person of the cf community you should be cautious at rubbing it in other people's faces how good the new version (or past/existing version) of cf is...
IMHO, CF, PHP or any other development its not without its faults... but comments like these dont help propogate the correct message.
You can run CF code, .NET, Java, JSP, PHP, and Ruby all in the same script.
You should check out CF, you will never look back!
http://www.cfinternals.org/blog/2007/06/should_eve...
Yes, someone can create a XYZ CFX, but can they remove XYZ support if it's not wanted? Can I swap out the baked-in XYZ support and replace the tags with a better implementation?
I wonder, though, what cffeed does when it tries to retrieve a feed that isn't currently available or parses one with malformed XML? I had to manually code a number of safety checks in our portlet to try and handle those situations gracefully.
I'm hoping to have time this weekend to starting messing around with the beta.
All I did was loop in the URLS of XML feeds into RSS.cfc and joined the Queries at the end much like this - however, I ended up writing the output to a static file and then scheduling the cfc to run once an hour: that way at least visitors to the site won't be waiting on 10 sites RSS feeds.
The multi threaded approach is interesting though !
@Brian - my UDF fails on that. I've fixed that, and another bug, and will post a new version to blog after I eat some eggs. Get it - aggregator - eggs - seriously though - I need breakfast.
@Tom - Certainly not. I will continue to work on it - when time permits - or as folks find bugs. We will have folks on MX6 and 7 for a long time. I'm not going to abandon them.
And, any way to size the tooltip popup window? I find that if there is a lot of text in one of the popup window, the right end of the window gets chopped off by about 3 or 4 pixels on the left edge, when I'm viewing the demo page in Safari. Also happens in FF, but doesn't seem to be as bad.
I should have put a size limit on the tooltip text. That's my fault. As for sizing it - I know you can edit the CSS for it (the reference talks about this), but I haven't tried it myself.
I know this post is a little old, but I discovered the answer to your question, and thought I would post it here. You need to access it using the element name as the identifier. For example:
XML.rss.channel.item[i]["content:encoded"].XmlText
i try to use it for parsing some different newspapers rss and show all togheter ordinate by time, your code seems to be perfect and also unique for reading source from a query.
Can tell to a complete rss-newbie where im wrong?
it possible to scan only feed that work?
You are a lighthouse of my CF apps like Ben that i know, if you come in Italy for a cfday in future don't forget you have a great pizza payed.
this is sample of the test page that use paragator : http://www.informagorizia.it/newsfeed.cfm .
I think you must sign on notes another pizza with great wine this time, next cfday in italy hope you come with ben
[Add Comment] [Subscribe to Comments]