Saving images from an RSS feed with ColdFusion

This post is more than 2 years old.

Yesterday I blogged a simple example of using ColdFusion's RSS parsing feature with the cffeed tag. One of the readers who commented on the story mentioned that he was interested in using cffeed to parse an RSS feed from the National Geographic. What made this feed a bit different is that it was specifically for pictures. He wanted to take those pictures and download them. What follows is a simple template that does just that. (I'll warn folks now though - I did not check what NG's requiresment were for copyright notices. You will want to do that if you make use of this code.)

<cfset rssUrl = "http://feeds.nationalgeographic.com/ng/photography/photo-of-the-day/">

<cffeed action="read" source="#rssUrl#" query="entries">

<cfset dir = expandPath("./ngg")> <cfif not directoryExists(dir)> <cfdirectory action="create" directory="#dir#"> </cfif>

<cfloop query="entries"> <cfset localfile = listLast(linkhref,"/")> <cfoutput>checking #localfile#... </cfoutput> <cfif not fileExists(dir & "/" & localfile)> <cfhttp method="get" getAsBinary="yes" url="#linkhref#" path="#dir#" file="#localfile#"> downloading!<br/> <cfelse> skipped<br/> </cfif> </cfloop>

Done.

The first two lines of code are pretty similar to what I used yesterday. I got rid of the properties attribute since I don't need the metadata from the feed. I'm going to store the images in a subdirectory called ngg, so you can see the simple check I use there to ensure it actually exists. Now for the fun part.

As I loop through the RSS feed, I make use of the linkhref column for the image url. This is where NG stores the path to their image. I get just the filename from that and see if I have a copy already. If not, I just use cfhttp to fetch it down. And that's it. Really. Here's a quick screen shot from my local directory after running this once.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Herman Potgieter posted on 6/6/2011 at 5:40 PM

That is very useful. I wonder if the same thing is possible with video's? I want to download video's that come out at night and watch them in the moring. but the site does not only feed video. this could be a fun experiment...

Comment 2 by Raymond Camden posted on 6/6/2011 at 5:43 PM

What's the RSS feed? I'll take a stab at it.

Comment 3 by todd sharp posted on 6/6/2011 at 6:09 PM

Another idea would be to use <cfimage> to grab the image. That way you have an image object right off the bat that you could resize, etc if you wanted to - then use cfimage to write it to disk.

Comment 4 by David Jennings posted on 6/7/2011 at 12:58 AM

If you were downloading large videos (or zip files i'm thinking) is there any way to track the progress of downloads or output a message to say a file in the queue had downloaded? Surely it would timeout with a list of big files

Comment 5 by Raymond Camden posted on 6/7/2011 at 1:00 AM

You could switch to using threads. Since most feeds are just 10 items you wouldn't fill up all your threads doing this. But to be honest, if they were huge files, I'd consider maybe another method.

Comment 6 by David Jennings posted on 6/7/2011 at 1:11 AM

Thanks, think i'll stick to my Air app then. Was just considering it as an alternative then I could push the files around just in coldfusion. thanks :)

Comment 7 by Herman Potgieter posted on 6/7/2011 at 4:17 PM

http://www.channelfireball.... is the web feed. mostly articles but sometimes video. I was just going to loop through the results and see if there is video in the feed download it.

Comment 8 by Raymond Camden posted on 6/7/2011 at 4:23 PM

Looking at the feed right now, which entry, if any, has a video?

Comment 9 by Herman Potgieter posted on 6/7/2011 at 4:33 PM

looking at it now its even easyer, everything that the feed name starts with Channel. so "Channel LSV: MIR VIS WTH Draft #2". if you only see the latest 10 there are two videos. going back further you get "Channel Conley: MSS Draft #11" then even older there is one that has a different naming structure: "Magic TV Top 8 of the Week: Best Lands" I dont know if the video is in all the feed. permissions issue from the computer I am on right now

Comment 10 by Raymond Camden posted on 6/7/2011 at 4:38 PM

I see titles like that, but I don't see a direct link to the video.

Note - I'm doing my testing with cfhttp, not cffeed. That ensures I see everything. They use some custom data in their feed.

Comment 11 by Herman Potgieter posted on 6/7/2011 at 4:45 PM

OK in the Magic TV one the video is embedded in the feed but since the update they did recently th video is not in the feed as far as I can tell. from what I found it seems they changed from using a youtube player to a flash player that is embedded in the code. there are other sites as well that do the same thing. but most of their video is premium content and that might not be legal so I wont even try.

Comment 12 by Raymond Camden posted on 6/7/2011 at 4:47 PM

Well, if you find a feed we can play with, just let me know. I do want to thank you though. This feed has an interesting property that I'm going to blog about. It's something that I've blogged before, but with my recent blog entries on RSS it bears repeating.

Comment 13 by Herman Potgieter posted on 6/7/2011 at 4:58 PM

You can use the "Magic TV Top 8 of the Week: Best Lands" through google reader it plays the video. its from May 30th. then there is always webcomics with video like this: http://www.smbc-comics.com not sure what their feed address is though. but I know that there is a video on the feed for today. google reader is awesome....

Comment 14 by Raymond Camden posted on 6/7/2011 at 5:06 PM

But it's playing because of the embedded HTML in the feed. What we want is a 'proper' link to the video that we don't have to dig to get. (If I may speak for "we" ;)

Comment 15 by Herman Potgieter posted on 6/7/2011 at 5:14 PM

I have gone over all my other RSS feeds and nothing else has video in it. well for the same net effect what I could do is link to the RSS and if in the Case of Channelfireball when the name sent through contains Channel then simply use some other way to go and download all the video from the site. how I will do that I am not sure how yet. espesially with the new Flashplayer they use. and you may speak for we :)

Comment 16 by Misty posted on 1/24/2012 at 7:15 PM

Hi ray, Good post, Just visited. I have one Change if you can guide a Bit, I have a Page which list images alongwith other information on the page, i just need to Scrap all Images and Store on my Loal System

Can u just guide

Comment 17 by Raymond Camden posted on 1/24/2012 at 7:41 PM

You can use CFHTTP get the HTML. Ensure you have resolvePath set to true. It will convert images like this: src="images/foo.jpg" to src="http://www.full.com/images/..." (WARNING TO READERS - my blog auto converts URLs to clickable stuff, so if that link goes to something naughty, don't blame me.)

Then you would use a regular expression to find all image URLs. You then use cfhttp to fetch each one and save it. The only issue you may run into is name conflicts. So a HTML page can easily have these two images:

/misc/foo.jpg
/people/foo.jpg

When saving these you need a way to handle that. You can do a few things.

a) Simply rename all images to UUIDs. Simple, but you lose any context about what the images were.

b) Increment. So the first foo.jpg is foo.jpg, the next is foo_1.jpg.

c) Mimic the directory structure. Given my example above, you could also create a misc and people folder.

Comment 18 by Misty posted on 6/29/2012 at 10:23 AM

Coming late to this question trying like this

<cfloop query="Recordset1">
<cfset querysetcell(Recordset1,"linkhref", "#request.siteURL#bigImage.cfm?imageID=#Val(imageID)#">, currentRow)>
</cfloop>

that will show me a link, yes it does, btw how i change it to how an image same case with video

Comment 19 by Raymond Camden posted on 6/29/2012 at 4:45 PM

I'm sorry, but I don't understand your English here.

Comment 20 by Misty posted on 6/29/2012 at 6:14 PM

Ok, My Bad

Written in hurry,

The above Code just show me a link like this

http://www.abc.com/acs.jpg

i see link, not exactly the Image, This is my Question

Same like image, i want to display a video too, the vodeo basically format IS FLV

Comment 21 by Raymond Camden posted on 6/30/2012 at 12:56 AM

I may not be understanding you still. Are you saying you don't know how to output HTML for an image tag? Given X is the URL like you describe, you do know it's just <img src="#x#">, right?

Comment 22 by Misty posted on 6/30/2012 at 9:51 AM

is my english that bad :(

well yes, i want to show an image in the feed,

the above one i gave just show it as an link

Comment 23 by Misty posted on 6/30/2012 at 10:25 AM

feed is showing like this

Gabriella-Demetriades-goes-bikini-for-FHM-123bolly-com-3.jpg
Zimbabwe

Copy of 4829c266bb9e6.jpg
Zimbabwe

The .jpg one is a link which clicked opens an image but i want the actual image to show instead of link in feed itself

Comment 24 by Raymond Camden posted on 6/30/2012 at 4:22 PM

Again, if you can extract the name of the image, like you have, you cna make an image show up by just using the <IMG> tag. If your name is just "foo.jpg', then you need to prepend it with the hostname of the site.

Comment 25 by Misty posted on 6/30/2012 at 4:45 PM

Here is how i am generating the feed

http://pastebin.com/kbeQgYA1

Comment 26 by Raymond Camden posted on 6/30/2012 at 5:08 PM

Now I'm more confused than ever. You're making a RSS feed. This blog entry was on getting images from a RSS feed you're reading. Can you please back up and explain again what you are trying to do?

Comment 27 by Misty posted on 6/30/2012 at 5:16 PM

Okay

Let me now explain you what i am doing, you have seen my Code which is what i am building an RSS Feed, Now the feed is working fine as it is showing the image link and the description.

But there is one thing in my RSS feed, I see the link to the Image but i want the Image should appear in the RSS Feed

I hope now i am clear

Comment 28 by Raymond Camden posted on 6/30/2012 at 5:24 PM

An RSS feed is XML, not HTML. It can't render anything. If you are saying that you want RSS readers to render the image, then you need to include the HTML in the RSS content. So you don't just include the url, but the html. Ie, <img src=".."> Then the RSS reader would render it. But an RSS reader may choose to strip out html when displaying.

Comment 29 by Misty posted on 6/30/2012 at 5:28 PM

but it is failing it, i have seen ur RSS, that seems working good, shows images why not mine

Comment 30 by Raymond Camden posted on 6/30/2012 at 5:41 PM

Again - your RSS reader may strip out the HTML. Looking at your code, you are setting linkhref to the image. That is not right. linkhref is meant to be the url of the resource, ie, what the rss entry should link to. RSS Readers will link to it, nothing more. You want to use the "description" field to include your image there. This is a block of text that can include html.

Comment 31 by Misty posted on 6/30/2012 at 8:51 PM

Okay, I tried it like this

<cfset cmap.publisheddate = "addedon">
<cfset cmap.content = "origin">
<cfset cmap.title = "<img src=#request.siteURL#admin/Uploads/thumbnails/#Recordset1.origin#/#Recordset1.image#>">

i am getting an error, now image does appear but i recieve an error too

There is a problem in the column mappings specified in the columnMap structure.

The query attribute input does not contain any column by the name of

Comment 32 by Raymond Camden posted on 6/30/2012 at 8:56 PM

Misty, you need to read the docs on CFFEED. cmap is used to associate a column in your query to known parts of a RSS feed. You want to:

a) Modify your original query to add a new column that will be your description for your rss items.

b) If you call the column description, then that _already_ maps to an RSS item, and when you generate the rss feed, it should work.

c) Not only is your cfmap.title line wrong, even if it worked it would be more wrong. Sorry if that is blunt. But your title field is just that, a title, not a block of content. That is what your description is for.