Ask a Jedi: Dealing with messy RSS

Steve asks:

A question on CFFEED: How do you control/resize the image in the description column of linked media? Or strip it out if it is malformed?

This is a great question because it touches on two important issues. One - if you willy nilly display the content of a RSS feed you may get undesired results in your page layout. In Steve’s case he was trying to display the content in side column and the images were too big to fit. The other issue is that the HTML you get from the RSS feed could not only be bad for your site layout, but could also be completely broken! Let’s look at the some examples of both cases.

First, let’s start by just getting the feed and displaying it.

<cffeed source = "http://wow.joystiq.com/rss.xml" properties = "myProps" query = "myQuery">

<cfoutput> <h2>#myProps.title#</h2> </cfoutput>

<cfoutput query = “myQuery”> <cfif myProps.version IS “atom_1.0”> <h3><a href = “#linkhref#”>#title#</a></h3> <cfelse> <h3><a href = “#rsslink#”>#title#</a></h3> </cfif> <div class=”content”> #content# </div> </cfoutput> </code>

This results in a hodgepodge of graphics and other items:

As you can see, some of the images are pretty wide. We could, if we wanted to, make use of regular expressions to find each image, remove any existing height/width attribute, and add our own with a set value of 200 for example. Setting just width should proportion things right. While that would work, it would probably easier to just use CSS:

<cffeed source = "http://wow.joystiq.com/rss.xml" properties = "myProps" query = "myQuery">

<style> div.content img { max-width:250px; } </style> <cfoutput> <h2>#myProps.title#</h2> </cfoutput>

<cfoutput query = “myQuery”> <cfif myProps.version IS “atom_1.0”> <h3><a href = “#linkhref#”>#title#</a></h3> <cfelse> <h3><a href = “#rsslink#”>#title#</a></h3> </cfif> <div class=”content”> #content# </div> </cfoutput> </code>

That works… ok…. see the result:

But as you can see, there is a huge old blue block below the graphic. If you view source, you discover all kinds of CSS embedded with the content. Even better, you see this gem:

<a href="http://".$GLOBALS["HTTP_HOST"]."/photos/pre-cataclysm-twilight-cultist-event/">Pre-Cataclysm Twilight Cultist Event</a>

I’m not sure what language that is. Probably a dead one. Either way, their feed generation code obviously has a bug in it. Steve also shared another RSS feed he had where a Youtube embed set of code literally just stopped 50% through the HTML. Most likely that was someone using a left() operation and not considering HTML. The point is - unless you know your feed source very well - you should probably expect - and be prepared to deal with - crap.

So what did I recommend? Get rid of the HTML altogether. Here is an example:

#reReplace(content, "<.*?>", "", "all")#

A bit draconian, but most likely the safest option.

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support.

Lafayette, LA https://www.raymondcamden.com

Comments