
The error was thrown by CFFEED. It wasn't an error on display, like what I describe here. I carefully read over the content that was being used for the feed data and then I saw it. The mysterious box of doom I like to call it. The character that obviously was pasted from some Word document, or other source, that got fubared when rendered.
It took me a while, but I was able to narrow down the character (19) and expose the error a bit. To see this in action, I modified the code from the blog entry mentioned above.
<cfset getEntries = queryNew("publisheddate,content,title")>
<cfset queryAddRow(getEntries)>
<cfset querySetCell(getEntries,"title", "LAST ENTRY")>
<cfset querySetCell(getEntries,"content", "<b>Test</b>")>
<cfset querySetCell(getEntries,"publisheddate", now())>
<cfset queryAddRow(getEntries)>
<cfset querySetCell(getEntries,"title", "LAST ENTRY2")>
<cfset querySetCell(getEntries,"content", "#chr(8220)#Test#chr(8220)# #chr(19)#")>
<cfset querySetCell(getEntries,"publisheddate", now())>
<cfset props = {version="rss_2.0",title="Test Feed",link="http://127.0.0.1",description="Test"}>
<cffeed action="create" properties="#props#" query="#getEntries#" xmlVar="result">
<cfcontent type="text/xml" reset="true"><cfoutput>#result#</cfoutput>
Note the data in the second query row. I added the chr(19) to the end. Now unlike the problem in the other blog entry (which resulted in Firefox not showing the complete feed), this one threw a real exception:
The input values might be invalid. The reason for exception is :
The data "Test X" is not legal for a JDOM character content: 0x13 is not a legal XML character.
As just an FYI, the "X" above was the literal bad character. I replied it so as to not cause any possible problems in my own RSS feed here.
Great. Not a legal XML character. Hey, I bet xmlFormat() will fix it, right? Of course not. As I said in the beginning. Ugh.
So to fix it, I modified the UDF mentioned in the earlier blog entry to just replace chr(19) with nothing.
You know - I get that different encodings can impact whats valid in XML. But would it be that hard to ask cffeed to sniff the current settings and just remove what isn't valid? Especially since it will be (most likely) crap characters like funky quotes or the like? Seriously - am I the only one having so much trouble with cffeed?
Archived Comments
I, too, have been unhappy with the escaping/fixing that XMLFormat does. I ended up writing my owner wrapper (named MyXMLFormat()) that calls XMLFormat() and does some additional search/replace.
Common characters I encountered that caused problems were ascii codes 11, 8220, 8221, 8216, 8217, 8211, 8212, 8226, 8230, and 8482. Probably all from MS Word.
Ray,
Do you guys have paid support with Adobe? It seems to me that if you do, this is something that they could/should be able to fix for you and issue a hotfix for. I know you have a work-around, but it's stuff like this that Adobe should be issuing more hotfixes for between point releases.
I see this and can't help but wonder what Groovy's MarkupBuilder would do with these duff Microsoft characters...
@Rob - Well, I reported it as a bug. I just can't imagine there is much that Adobe would do for this. I seem to be the only one complaining. ;)
@Sean - That sounds like a challenge to me. :) Going to try that.
Sean and I played with this a bit. First off - MarkupBuilder's are amazing. I just skimmed it for now (I was still a bit farther back in the Groovy book) but I'm in awe at how elegant it works.
But anyway, I took one of their demos, and modified it to include bad character 19 in the output. It runs just fine. On display, it shows up invisible on my screen, but the point is, it runs just fine. Here is the code (again, copyright goes to the Groovy in Action folks, I modified other bits of the code as well while playing):
char bad = 19
def builder = new groovy.xml.MarkupBuilder()
builder.numbers {
for(i in 10..15) {
number (value: i, square: i*i, double:i*2, label:'Hard coded '+bad + ' more text') {
for (j in 2..<i) {
if(i % j == 0) {
factor (value:j)
}
}
}
}
}
I hope to heck that renders ok here.
The Book: It is important to note that suddenly, and against all probability, a Sperm Whale had been called into existence, several miles above the surface of an alien planet and since this is not a naturally tenable position for a whale, this innocent creature had very little time to come to terms with its identity. This is what it thought, as it fell:
The Whale: Ahhh! Woooh! What's happening? Who am I? Why am I here? What's my purpose in life? What do I mean by who am I? Okay okay, calm down calm down get a grip now. Ooh, this is an interesting sensation. What is it? Its a sort of tingling in my... well I suppose I better start finding names for things. Lets call it a... tail! Yeah! Tail! And hey, what's this roaring sound, whooshing past what I'm suddenly gonna call my head? Wind! Is that a good name? It'll do. Yeah, this is really exciting. I'm dizzy with anticipation! Or is it the wind? There's an awful lot of that now isn't it? And what's this thing coming toward me very fast? So big and flat and round, it needs a big wide sounding name like 'Ow', 'Ownge', 'Round', 'Ground'! That's it! Ground! Ha! I wonder if it'll be friends with me? Hello Ground!
[dies]
http://www.imdb.com/title/t...
I've run into similar problems quite a bit. This regular expression replace works well with XML processing where there is potential for bad chars:
// remove anything outside of explicit hex range (x20-x7F=standard chars,xA=carriage return,xD=line feed)
rssXml = reReplace(rssXml,"[^(!\x20-\x7F|\xA|\xD)]","","all");
@Ray
The XML specification [1] states that char(19), and all the ascii control characters are not allowed in well formed XML.
If Groovy's MarkupBuilder is actually outputting char(19) then Groovy is broken. It's possible it's stripping the characters for you though, you'd have to check. If it's not, then the XML document you just generated is invalid, and almost every other XML parsing library will choke on it.
Yeah it's a real pain that this stuff happens, but that's what you get from a strict validating language like XML. CF is just using Xerces [2] for XML processing, so the error you're seeing has really nothing to do with CF at all, but rather the fact that Xerces followed the XML specification.
[1] <http://www.w3.org/TR/xml11/...
[2] <http://xerces.apache.org/xe...
@ES: Groovy: I could test this by having Groovy save the output to a file - or to a variable, if possible - and then looping over the chars to check. I don't know Groovy well enough yet to do that, but I'll look into it.
So here is a crazy question. If chr(19) is never allowed in well formed XML, why doesn't xmlFormat remove it? Or handle it? I'd call that a bug.
@Ray
Well, I'm not sure I'd call it a bug since XMLFormat() is fairly specific about what it escapes in the docs. Certainly it'd be useful if XMLFormat() removed all the restricted chars, but as it stands it's really more like HTMFormat(), than some kind of XMLSanitize().
Seems like CF just uses org.apache.commons.lang.StringEscapeUtils.escapeXml(), which isn't designed to remove all the restricted chars.
Why not file a wish ticket about it? :)
http://livedocs.adobe.com/c...
Trust - I've filed a bug report for this quite some time ago. I think I need to file a new bug report though since cffeed has no automatic way of running xmlformat on data psssed to it. I blogged about this before, but if your db data isn't all escaped, you have to use querySetCell/xmlFormat on every cell. cffeed should do it for you automatically.
Nice use of the fail whale. ;)
Oh, THAT'S what that whale image is!
Thank you.
I found a 10 minute interview with Twitter founder Biz Stone at
http://www.npr.org/template...
Phillip, I enjoyed your quote though. I'm going to have to go back and read the books again. :)
I ran into this same problem when using cffile to write out the RSS XML manually. The XML had validation issues once the file was created.
That's when I found this UDF on CFLib.
DeMoronize
http://cflib.org/index.cfm?...
I don't know if it'd work in this situation, but the I'd say the function name is definitely appropriate.
That's what you get for using an inferior web platform.
Oh how amazingly useful there Charles. Thanks. Please - tell us the name of your miraculous platform that apparently has no bugs?