Many web sites provide what's called an RSS feed. RSS stands for Really Simple Syndication. In simpler terms, it's a way to create a list of articles in a common XML format. So for example, a blog, like this one, can share it's last 10 articles. RSS is not read by humans typically. Normally another program will read in the feed and work with it. In this blog entry, I'm going to discuss how ColdFusion can work with RSS feeds, specifically how to read them and create usable data from it. ColdFusion also provides a simple way to create feeds as well. I'll discuss that later.
To begin, let's select an RSS feed. I went to CNN and discovered they had a nice list of feeds. The first one was for their top stories and had this URL: http://rss.cnn.com/rss/cnn_topstories.rss Go ahead and click that link to see what the XML looks like.
ColdFusion provides a tag that both reads and creates RSS feeds: cffeed. At it's simplest usage, you can point cffeed to the URL and have it create a query from it. You can also ask cffeed to return information about the feed in general. RSS actually covers multiple different formats: Two 'core' types (RSS and Atom) and multiple versions as well. cffeed will work with all of them, but the type of feed read does impact what's returned. For now, let's just dump all the data from the feed:
<cffeed action="read" source="#rssUrl#" query="entries" properties="info"> <cfdump var="#info#">
<cfdump var="#entries#">
<cfset rssUrl = "http://rss.cnn.com/rss/cnn_topstories.rss">
The info structure contains metadata about the RSS feed. What you see there will depend on the type and how descriptive the feed is. A remote site may choose to include optional data that another feed does not. To be honest, this information is not going to be very useful if you are only parsing one feed. Let's look at that dump anyway:
Lots of info, right? Again though - if your intent was just to put CNN's news on your web site, this metadata really isn't helpful to you. If you are building an aggregator with random RSS urls being added than it becomes more important. But for our needs, we are done with it. Now let's look at the query created:
Click that image above to see the full screen shot. It's a huge query. Do you need all those columns? Heck no. When cffeed parses a RSS feed, it has to support multiple different types of feeds. Because of this it has a large number of different columns that may or may not be used depending on the type. In general, the value and usefulness of a column depends on if the base type was RSS or Atom. If you read the reference for cffeed you can see the list of columns. Based on the first dump we saw, we know CNN is using an RSS type feed. (Confused? RSS is a generic term as well as a particular type as well. So we may say a site has an RSS feed, and the type itself is RSS. It may also be an Atom feed, but most likely it would still be linked to as an RSS feed.)
So given that we have a large set of columns and we can use the reference to figure out what means what - what's the next step? At this point it depends on your needs. How exactly are you planning on using the feed? For most folks, they want to create a list of the entries on their site with a link to the full article. From the docs we see that the column "RSSLINK" provides the link and "TITLE" provides a title. Let's just try them fornow.
<cfoutput query="entries">
<a href="#rsslink#">#title#</a><br/>
</cfoutput>
This returns:
Pretty simple, right? In theory, we could stop there. But let's look at two other properties we may want to display. The first is "PUBLISHEDDATE", which as you can imagine is the stories publication date. This could be useful for sites that update content a bit less frequently than CNN. The other is "CONTENT". Some RSS feeds will provide all of their story text within the feed. I think most though provide a summary. The idea being that you want your RSS feed consumers to be tempted to actually go to the site instead. Let's add both to our code:
<cfoutput query="entries">
<p>
<a href="#rsslink#">#title#</a> #publisheddate#<br/>
#content#
</p>
</cfoutput>
And the result:
Those links at the bottom come from CNN, not my code. It's something you have to keep in mind when using the content from RSS feeds. The HTML may or may not work well within your own site. That's it for now - I've included the full template below. After the template I've got a note you may want to read.
<cffeed action="read" source="#rssUrl#" query="entries" properties="info"> <!---
<cfdump var="#info#">
--->
<!---
<cfdump var="#entries#">
---> <cfoutput query="entries">
<p>
<a href="#rsslink#">#title#</a> #publisheddate#<br/>
#content#
</p>
</cfoutput>
<cfset rssUrl = "http://rss.cnn.com/rss/cnn_topstories.rss">
Notes:
As I said, working with a specific RSS feed isn't too difficult. You can do exactly like I did. cfdump the info and the data and see which columns you care about. If you do intend to work with multiple, unknown (at creation) feeds, you may want to consider something like Paragator. That's a ColdFusion component I created that both multithreads parsing multiple feeds as well as does a bit of normalization as well.
It goes without saying, every time you use a remote source in your code you should both cache it and prepare for an error. CNN could go down. Can you imagine (or even write and share) an example of the code above that both adds caching as well as exception handling?
Archived Comments
Very nice! I've played around with these commands and trick and cvodes myself, but your presentation pulled it together for me. Thanks!
How about pulling stuff from a feed? For example, National Geographic has a feed of photo of the day <http://feeds.nationalgeogra.... Might be cool to have a CF script that automatically goes there, pulls the image down, and stores it where one places the desktop image for your box.
I have much much love for CFFEED. I still remember having to hand-write a scrapper for RSS feeds. *shudder* I've started working on using CFFEED to pull feeds from Flickr. Like the CNN one, they add in their own undesired HTML, so I've added an extra two lines to my cfoutput area that cleans out the junk and leaves me with just the image and its href. :-)
@Robert: I'll do a follow up showing an example of that.
@Collectonian (dude - is that your name? is it Spanish? its cool!) thats an issue you run into with feeds - extra junk. I'd imagine _most_ of the time though folks just use the titles and don't bother with the content.
@Robert: I think this warrants its own blog post with some explanation, but this code will save the images to a subdirectory called ngg.
http://pastebin.com/efaPYdE2
Thanks Ray. I've started to have to deal with this and it'll come in handy as a resource.
Robert, I modified the code a bit and blogged it: http://www.coldfusionjedi.c...
@Ray - just my online nick, and a totally made up word from my Star Trek geek days when I was working out the plot for a novel before discovering the insane publishing guidelines for Trek novels. :-)
As always Ray, when I need help with something, you have a blog post on it. Amazing that you can read my mind like that.
Anyway, I used your post, and another post I found on CFFEED and blogged about my adventure. http://goo.gl/TQpqQ
Brian, I wanted to post a comment on your blog, but I had to login. Is that really what you want? Seems like it would limit comments. Just my 2 cents.
Anyway - I wanted to point out something that may be helpful. Your blog entry was on combining multiple feeds. Check out my Paragator project (paragator.riaforge.org). It's specifically for joining N feeds together.
I am trying to read RSS Feed XML using CFFEED.
The code you have explained works perfectly , however when I change the URL to 'http://www.schneider-electr... , it displays error "The value of the attribute query, which is currently entries, is invalid."
What am I missing?
I don't know. This worked fine for me:
<cffeed source="http://www.schneider-electr..." query="entries">
<cfdump var="#entries#">
Thanks for checking. not sure what is the problem. will try it again...
Still getting the same error . (http://mobile.schneider-ele...
Any other suggestions to try?
You aren't getting the same error. Your error is:
Variable INFO is undefined.
Hi Ray,
I wanna show images as in dynamic td's from the rss feed, my feed info is coming from a database, how that can be done
Um, well, that's a big set of questions. Is there a particular part that is troubling you?
Hi ray, Let me share the code for reading the simple RSS, but for viewing images, what should i change, i am getting distracted a bit
At this point making use of the cffeed is nice but the area I have been wanting to work around does not allow you to add any further header information at the top of the feed. What problem I continue to run into is the fact that every time I want to update the feed the output overwrites the entire .xml.
I want to add this to the output from the use of the cffeed.
xmlns:wfw='http://wellformedweb.org/Co...
xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:sy='http://purl.org/rss/1.0/mod...
xmlns:slash='http://purl.org/rss/1.0/mod...
xmlns:georss='http://www.georss.org/georss'
xmlns:geo='http://www.w3.org/2003/01/g...
If I manually enter this into the feed after the feed is created with the cffeed and then want to more less update the feed with the same cffeed usage it overwrites the entire page. Is there anyway to continue using the cffeed tag and have it add the extended namespace xml to the feed during the create feed process.
Now I already know I can create an RSS feed without the use of the cffeed tag and believe that to be the only possible way to do this, but if you know of a way to do it with the cffeed tag, of which I may have overlooked - that would be great.
CFFEED will only overwrite existing data if you tell it to. Just use CFFEED to generate an XML string, add it, and then save the data. You could add that stuff with one quick CF string function.
Right, I understand that it has a write attribute, but I do not want it to write every time I want to update the .xml file. Thinking I may be better off running on the read option, but the only area that effects that is when I actually manually enter my blog post within my cpanel for database entry which then outputs from the query to the .xml file and a page on the site. If I use the read option it will only be pulling from the blog feed, which is WordPress.
Have created a proprietary system on CF and want to not utilize the feed from the WP blog. Want to be able to update the .xml file but driven from the DB and not WP. This is why I have not used the read option.
Thinking this is helping me come to a solution just by discussing it so bare with me here. First time using the cffeed tag and still believe the option is to gnerate an .xml file without cffeed and run a db query and generate the updated .xml - what do you think or am I just making this too difficult?
I'm confused. Why aren't you using CFFEED to create the RSS feed from your database query?
Actually the issue I am really trying to solve is with Chrome and FireFox concerning formatting. The format for Chrome outputs an entire string without formatting it. While FireFox only allows for the subscribe option when posting the page.
http://www.linkworxseo.com/...
Is there any way to make Chrome format it with the same look and feel of the other browsers? It will display http://www.linkworxseo.com/... data just fine and if I manually edit the feed.xml file it will appear the same as the rest of the browsers with the exception of FireFox that only allows for subscribing.
The extended .xml schema that it would not allow me to post here is what I am trying to make the write function produce. I suppose it is more of a question of how to style this feed by applying a style sheet or the markup provided above. If I write the .xml file once it is fine to add the code above manually until I try to update the .xml file again from the DB because I added a new post.
Well to be clear, this is XML data, and formatting is going to be browser specific. Normally humans don't even look at it. If Chrome is displaying it as one big string, you probably need to add the CFHEADER tag and use the correct type to tell the browser you are outputting XML.
Hey Raymond. Excellent post as usual. When I look at the raw XML data from a feed I am trying to consume - I see a field called: content:encoded which is HTML containing a link to an image. I'd like to be able to 'see' this, but no sign of it in the cfdump from my cffeed. I tried #rsFeed.content:encoded# to see it anyway to no avail. How do I get to this field? BTW: The feed URL is: http://ultimateclassicrock....
You would need to use cfhttp and skip cffeed for this one. See http://www.raymondcamden.co...
It is very nice. :)
It works for me as i needed
thanks a lot
Umm, Raymond, i wanted to look at your modified code, but your site is no longer??
How are you posting here if the site is no longer? ;)
I am ColdFusion novice, but alas my feverish quest to find the answer has failed me.
How would I go about taking the contents of a output page, and saving it to a static HTML file?
In that, lets say I have my tutorial RSS feed gathering results. Instead of just re-running the rss feed and just chewing up bandwidth and CPU cycles,
how do I get the value of result into a new static file
Any hints or pointers would be appreciated
@Arnold: Well in theory, you can use just use RAM based caching with cacheGet/cachePut. That would store the results in RAM and let you display the results quicker. I'd recommend this over saving it to a file as it would be much easier to implement.
Thanks for this sample Ray. With Yahoo Pipes being shut down yesterday, I needed a replacement and this was a huge help!
is there a way to use 24 hour time instead of military and to drop the GMT
The date is a standard format, so sure, you can parse it and then format it as you will.
Works great, but how can I transform the output of #publisheddate# to my needs? For example, how can I make "Thu, 17 Dec 2015 13:00:18 GMT" look like "17. Dezember 2015"?
Basically you have to parse it into a date object so you can format it as is. I *believe* I wrote code for this in my paragator project up on RIAForge. I haven't touched it in years, but check that.
Thanks a lot for taking a look into it. I've tried
#lsDateFormat(publisheddate, "mmmm yyyy")#
but got this:
"Thu, 17 Dec 2015 13:00:18 GMT is an invalid date format"
Like I said - you have to parse it first - you can't just format it. Find that Paragator project on RIAForge - I'm 99% sure it has the code.
I just used #DateFormat(publisheddate,"dd. mmmmm yyyy")# instead and then did a #ReplaceList# to convert the month names to my needs.
Btw. here is how: https://gist.github.com/ITf...