Yesterday I blogged a simple example of using ColdFusion's RSS parsing feature with the cffeed tag. One of the readers who commented on the story mentioned that he was interested in using cffeed to parse an RSS feed from the National Geographic. What made this feed a bit different is that it was specifically for pictures. He wanted to take those pictures and download them. What follows is a simple template that does just that. (I'll warn folks now though - I did not check what NG's requiresment were for copyright notices. You will want to do that if you make use of this code.)
<cffeed action="read" source="#rssUrl#" query="entries"> <cfset dir = expandPath("./ngg")>
<cfif not directoryExists(dir)>
<cfdirectory action="create" directory="#dir#">
</cfif> <cfloop query="entries">
<cfset localfile = listLast(linkhref,"/")>
<cfoutput>checking #localfile#... </cfoutput>
<cfif not fileExists(dir & "/" & localfile)>
<cfhttp method="get" getAsBinary="yes" url="#linkhref#" path="#dir#" file="#localfile#">
downloading!<br/>
<cfelse>
skipped<br/>
</cfif>
</cfloop> Done.
<cfset rssUrl = "http://feeds.nationalgeographic.com/ng/photography/photo-of-the-day/">
The first two lines of code are pretty similar to what I used yesterday. I got rid of the properties attribute since I don't need the metadata from the feed. I'm going to store the images in a subdirectory called ngg, so you can see the simple check I use there to ensure it actually exists. Now for the fun part.
As I loop through the RSS feed, I make use of the linkhref column for the image url. This is where NG stores the path to their image. I get just the filename from that and see if I have a copy already. If not, I just use cfhttp to fetch it down. And that's it. Really. Here's a quick screen shot from my local directory after running this once.
Archived Comments
That is very useful. I wonder if the same thing is possible with video's? I want to download video's that come out at night and watch them in the moring. but the site does not only feed video. this could be a fun experiment...
What's the RSS feed? I'll take a stab at it.
Another idea would be to use <cfimage> to grab the image. That way you have an image object right off the bat that you could resize, etc if you wanted to - then use cfimage to write it to disk.
If you were downloading large videos (or zip files i'm thinking) is there any way to track the progress of downloads or output a message to say a file in the queue had downloaded? Surely it would timeout with a list of big files
You could switch to using threads. Since most feeds are just 10 items you wouldn't fill up all your threads doing this. But to be honest, if they were huge files, I'd consider maybe another method.
Thanks, think i'll stick to my Air app then. Was just considering it as an alternative then I could push the files around just in coldfusion. thanks :)
http://www.channelfireball.... is the web feed. mostly articles but sometimes video. I was just going to loop through the results and see if there is video in the feed download it.
Looking at the feed right now, which entry, if any, has a video?
looking at it now its even easyer, everything that the feed name starts with Channel. so "Channel LSV: MIR VIS WTH Draft #2". if you only see the latest 10 there are two videos. going back further you get "Channel Conley: MSS Draft #11" then even older there is one that has a different naming structure: "Magic TV Top 8 of the Week: Best Lands" I dont know if the video is in all the feed. permissions issue from the computer I am on right now
I see titles like that, but I don't see a direct link to the video.
Note - I'm doing my testing with cfhttp, not cffeed. That ensures I see everything. They use some custom data in their feed.
OK in the Magic TV one the video is embedded in the feed but since the update they did recently th video is not in the feed as far as I can tell. from what I found it seems they changed from using a youtube player to a flash player that is embedded in the code. there are other sites as well that do the same thing. but most of their video is premium content and that might not be legal so I wont even try.
Well, if you find a feed we can play with, just let me know. I do want to thank you though. This feed has an interesting property that I'm going to blog about. It's something that I've blogged before, but with my recent blog entries on RSS it bears repeating.
You can use the "Magic TV Top 8 of the Week: Best Lands" through google reader it plays the video. its from May 30th. then there is always webcomics with video like this: http://www.smbc-comics.com not sure what their feed address is though. but I know that there is a video on the feed for today. google reader is awesome....
But it's playing because of the embedded HTML in the feed. What we want is a 'proper' link to the video that we don't have to dig to get. (If I may speak for "we" ;)
I have gone over all my other RSS feeds and nothing else has video in it. well for the same net effect what I could do is link to the RSS and if in the Case of Channelfireball when the name sent through contains Channel then simply use some other way to go and download all the video from the site. how I will do that I am not sure how yet. espesially with the new Flashplayer they use. and you may speak for we :)
Hi ray, Good post, Just visited. I have one Change if you can guide a Bit, I have a Page which list images alongwith other information on the page, i just need to Scrap all Images and Store on my Loal System
Can u just guide
You can use CFHTTP get the HTML. Ensure you have resolvePath set to true. It will convert images like this: src="images/foo.jpg" to src="http://www.full.com/images/..." (WARNING TO READERS - my blog auto converts URLs to clickable stuff, so if that link goes to something naughty, don't blame me.)
Then you would use a regular expression to find all image URLs. You then use cfhttp to fetch each one and save it. The only issue you may run into is name conflicts. So a HTML page can easily have these two images:
/misc/foo.jpg
/people/foo.jpg
When saving these you need a way to handle that. You can do a few things.
a) Simply rename all images to UUIDs. Simple, but you lose any context about what the images were.
b) Increment. So the first foo.jpg is foo.jpg, the next is foo_1.jpg.
c) Mimic the directory structure. Given my example above, you could also create a misc and people folder.
Coming late to this question trying like this
<cfloop query="Recordset1">
<cfset querysetcell(Recordset1,"linkhref", "#request.siteURL#bigImage.cfm?imageID=#Val(imageID)#">, currentRow)>
</cfloop>
that will show me a link, yes it does, btw how i change it to how an image same case with video
I'm sorry, but I don't understand your English here.
Ok, My Bad
Written in hurry,
The above Code just show me a link like this
http://www.abc.com/acs.jpg
i see link, not exactly the Image, This is my Question
Same like image, i want to display a video too, the vodeo basically format IS FLV
I may not be understanding you still. Are you saying you don't know how to output HTML for an image tag? Given X is the URL like you describe, you do know it's just <img src="#x#">, right?
is my english that bad :(
well yes, i want to show an image in the feed,
the above one i gave just show it as an link
feed is showing like this
Gabriella-Demetriades-goes-bikini-for-FHM-123bolly-com-3.jpg
Zimbabwe
Copy of 4829c266bb9e6.jpg
Zimbabwe
The .jpg one is a link which clicked opens an image but i want the actual image to show instead of link in feed itself
Again, if you can extract the name of the image, like you have, you cna make an image show up by just using the <IMG> tag. If your name is just "foo.jpg', then you need to prepend it with the hostname of the site.
Here is how i am generating the feed
http://pastebin.com/kbeQgYA1
Now I'm more confused than ever. You're making a RSS feed. This blog entry was on getting images from a RSS feed you're reading. Can you please back up and explain again what you are trying to do?
Okay
Let me now explain you what i am doing, you have seen my Code which is what i am building an RSS Feed, Now the feed is working fine as it is showing the image link and the description.
But there is one thing in my RSS feed, I see the link to the Image but i want the Image should appear in the RSS Feed
I hope now i am clear
An RSS feed is XML, not HTML. It can't render anything. If you are saying that you want RSS readers to render the image, then you need to include the HTML in the RSS content. So you don't just include the url, but the html. Ie, <img src=".."> Then the RSS reader would render it. But an RSS reader may choose to strip out html when displaying.
but it is failing it, i have seen ur RSS, that seems working good, shows images why not mine
Again - your RSS reader may strip out the HTML. Looking at your code, you are setting linkhref to the image. That is not right. linkhref is meant to be the url of the resource, ie, what the rss entry should link to. RSS Readers will link to it, nothing more. You want to use the "description" field to include your image there. This is a block of text that can include html.
Okay, I tried it like this
<cfset cmap.publisheddate = "addedon">
<cfset cmap.content = "origin">
<cfset cmap.title = "<img src=#request.siteURL#admin/Uploads/thumbnails/#Recordset1.origin#/#Recordset1.image#>">
i am getting an error, now image does appear but i recieve an error too
There is a problem in the column mappings specified in the columnMap structure.
The query attribute input does not contain any column by the name of
Misty, you need to read the docs on CFFEED. cmap is used to associate a column in your query to known parts of a RSS feed. You want to:
a) Modify your original query to add a new column that will be your description for your rss items.
b) If you call the column description, then that _already_ maps to an RSS item, and when you generate the rss feed, it should work.
c) Not only is your cfmap.title line wrong, even if it worked it would be more wrong. Sorry if that is blunt. But your title field is just that, a title, not a block of content. That is what your description is for.