Using ColdFusion to find the RSS URL from a web site
This post is more than 2 years old.
Many web sites now include a simple way to autodiscover the RSS feed for the site. This is done via a simple LINK tag and is supported by all the modern browsers. You should see - for example - a RSS icon in the address bar at this blog because I have the following HTML in my HEAD block:
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feedproxy.google.com/RaymondCamdensColdfusionBlog" />
I was talking to Todd Sharp today about how ColdFusion could look for this URL and I came up with the following snippet.
<cfset urls = ["http://www.raymondcamden.com", "http://www.coldfusionbloggers.org", "http://www.androidgator.com", "http://www.cfsilence.com/blog/client"]> <cfloop index="u" array="#urls#"> <cfoutput>Checking #u#<br/></cfoutput> </cfloop><cfhttp url="#u#">
<cfset body = cfhttp.fileContent>
<cfset linkTags = reMatch("<link[^>]+type=""application/rss\+xml"".*?>",body)>
<cfif arrayLen(linkTags)>
<cfset rssLinks = []>
<cfloop index="ru" array="#linkTags#">
<cfif findNoCase("href=", ru)>
<cfset arrayAppend(rsslinks, rereplaceNoCase(ru,".*href=""(.*?)"".*", "\1"))>
</cfif>
</cfloop>
<cfdump var="#rsslinks#" label="RSS Links">
<cfelse>
None found.
</cfif>
<p/>
The snippet begins with a few sample URLs I used for testing. We then loop over each and perform a HTTP get. From this we can then use some regex to find link tags. You can have more than one so I create an array for my results and append to it the URLs I find within them. Nice and simple, right? You could also turn this into a simple UDF:
} </cfscript><cfscript> function getRSSUrl(u) { var h = new com.adobe.coldfusion.http(); h.setURL(arguments.u); h.setMethod("get"); h.setResolveURL(true); var result = h.send().getPrefix().fileContent; var rssLinks = []; var linkTags = reMatch("<link[^>]+type=""application/rss\+xml"".*?>",result);
if(arrayLen(linkTags)) {
var rssLinks = [];
for(var ru in linkTags) {
if(findNoCase("href=", ru)) arrayAppend(rsslinks, rereplaceNoCase(ru,".*href=""(.*?)"".*", "\1"));
}
}
return rssLinks;
Not sure how useful this is - but enjoy!
Comments