So I've blogged before about how xmlFormat() is a bit buggy. While it will remove most characters, including "high ascii" characters in the range of 128-255, it will gleefully ignore other high ascii characters, for example, character 8220 which is the funky Microsoft Word quote. Unfortunately it looks like the same code used for xmlFormat is used to escape text when you create feeds with CFFEED. Consider the following example:
<cfset getEntries = queryNew("publisheddate,content,title")>
<cfset queryAddRow(getEntries)>
<cfset querySetCell(getEntries,"title", "LAST ENTRY")>
<cfset querySetCell(getEntries,"content", "<b>Test</b>")>
<cfset querySetCell(getEntries,"publisheddate", now())>
<cfset queryAddRow(getEntries)>
<cfset querySetCell(getEntries,"title", "LAST ENTRY2")>
<cfset querySetCell(getEntries,"content", "#chr(8220)#Test#chr(8220)#")>
<cfset querySetCell(getEntries,"publisheddate", now())>
<cfset props = {version="rss_2.0",title="Test Feed",link="http://127.0.0.1",description="Test"}>
<cffeed action="create" properties="#props#" query="#getEntries#" xmlVar="result">
<cfcontent type="text/xml" reset="true"><cfoutput>#result#</cfoutput>
The first entry will correctly show up in Firefox, but the second will not, and if you view source, you see the B tags are properly escaped, but the funky MS Word character is not. Now obviously I can make sure to "clean" my data before it gets used in the feed, but I wasn't aware this was an even an issue until a friend reported that the feed at ColdFusionBloggers suddenly turned up empty. For now I've switched to the solution below - which is not a good solution, but I needed a quick fix.
<!--- clean up bad stuff --->
<cfloop query="items">
<cfset fixedcontent = replaceList(content, "#chr(25)#,#chr(212)#,#chr(248)#,#chr(937)#,#chr(8211)#", "")>
<cfset fixedcontent = replaceList(fixedcontent,chr(8216) & "," & chr(8217) & "," & chr(8220) & "," & chr(8221) & "," & chr(8212) & "," & chr(8213) & "," & chr(8230),"',',"","",--,--,...")>
<cfset querySetCell(items, "content", fixedcontent, currentRow)>
</cfloop>
<cffeed action="create" properties="#props#" columnMap="#cmap#" query="#items#" xmlVar="result">
Archived Comments
I've never cared much for the XMLFormat() path and have instead chosen to use a less...engineered, maybe?...CDATA block. If that can be used, it's a bit of a cleaner solution, in my opinion.
That would work for xmlFormat - but not cffeed as I assume the < would be escaped automatically.
Good point. I assumed you had considered the possibility, but thought I'd throw it out there since you didn't mention it specifically. As I recall, it doesn't work for CFXML and the mechanics of that are probably fairly similar to those of CFFEED.
It's not just a few characters about around 25 that are not part of the ISO-8859-1 standard but are often found in documents and on web sites.
http://en.wikipedia.org/wik...
This was my fix for the problem.
<cffunction name="UnicodeWin1252" hint="Converts MS-Windows superset characters (Windows-1252) into their XML friendly unicode counterparts" returntype="string">
<cfargument name="value" type="string" required="yes">
<cfscript>
var string = value;
string = replaceNoCase(string,chr(8218),'&##8218;','all'); // ‚
string = replaceNoCase(string,chr(402),'&##402;','all'); // ƒ
string = replaceNoCase(string,chr(8222),'&##8222;','all'); // „
string = replaceNoCase(string,chr(8230),'&##8230;','all'); // …
string = replaceNoCase(string,chr(8224),'&##8224;','all'); // †
string = replaceNoCase(string,chr(8225),'&##8225;','all'); // ‡
string = replaceNoCase(string,chr(710),'&##710;','all'); // ˆ
string = replaceNoCase(string,chr(8240),'&##8240;','all'); // ‰
string = replaceNoCase(string,chr(352),'&##352;','all'); // Š
string = replaceNoCase(string,chr(8249),'&##8249;','all'); // ‹
string = replaceNoCase(string,chr(338),'&##338;','all'); // Œ
string = replaceNoCase(string,chr(8216),'&##8216;','all'); // ‘
string = replaceNoCase(string,chr(8217),'&##8217;','all'); // ’
string = replaceNoCase(string,chr(8220),'&##8220;','all'); // “
string = replaceNoCase(string,chr(8221),'&##8221;','all'); // ”
string = replaceNoCase(string,chr(8226),'&##8226;','all'); // •
string = replaceNoCase(string,chr(8211),'&##8211;','all'); // –
string = replaceNoCase(string,chr(8212),'&##8212;','all'); // —
string = replaceNoCase(string,chr(732),'&##732;','all'); // ˜
string = replaceNoCase(string,chr(8482),'&##8482;','all'); // ™
string = replaceNoCase(string,chr(353),'&##353;','all'); // š
string = replaceNoCase(string,chr(8250),'&##8250;','all'); // ›
string = replaceNoCase(string,chr(339),'&##339;','all'); // œ
string = replaceNoCase(string,chr(376),'&##376;','all'); // Ÿ
string = replaceNoCase(string,chr(376),'&##376;','all'); // Ÿ
string = replaceNoCase(string,chr(8364),'&##8364','all'); // €
</cfscript>
<cfreturn string>
</cffunction>
Do you mind if I include this in toXML.cfc?
Sure go for it
Thanks Ben. Updated:
http://www.coldfusionjedi.c...
This will probably roll into Paragator as well.
The file download for toxml seems to be the one released on 30/Apr even though the page says 14th Aug?
Sorry about that - try now please.
Thanks, works great now
Doesn't the CFLIB tag XMLFormat2() handle high ascii?
It covers some - but not all.
Thanks guys. I've been banging my head on this problem for hours now!
This sound great... I'm sort of new to CF -- where/how would I implement this so that it make the corrections.
Joel - my blog entry ends with an example of how I change the data before I pass to cffeed.
Old post, but a top result in Google so adding to the conversation...
I solved this with CDATA blocks inserted after creating the feed. So after the cffeed I've run these two REReplace fucntions.
<cfset FeedXML = REReplace(Variables.FeedXML, "(?m)^(\s*<title>)(.*?)(</title>\s*)$", "\1<![CDATA[\2]]>\3", "all")>
<cfset FeedXML = REReplace(Variables.FeedXML, "(?m)^(\s*<description>)(.*?)(</description>\s*)$", "\1<![CDATA[\2]]>\3", "all")>
If you want to write the feed to disk then you'll need to put it into a variables, run the above and then cffile it to disk.