Forgive the overly dramatic blog title - just having a bit of fun. A coworker asked me yesterday how one could take a flat XML file and serve it up as JSON. I told him I'd whip up a "quick example" for him and blog it to share with others. Smirking a bit to myself, I imagined I'd be done in 5 minutes and it would be a bit of a fluff piece. My ego ran head first into a brick wall rather quickly - which - I will admit - was kind of fun. Here's what I discovered.
For the hell of it, I thought, why not simply read in the XML, parse it, and serialize it?
<cfset xmlFile = expandPath("./Applications.xml")>
<cfset xmlData = xmlParse(xmlFile)>
<cfset jsonData = serializeJSON(xmlData)>
<cfoutput>#jsonData#</cfoutput>
Looks simple enough, right? However, it appears that serializeJSON takes the 'toString' version of xmlData. Even though it's a ColdFusion XML variable, it's first converted to a string and then passed to serializeJSON. So basically you get a large JSON encoded string. Or as I call it - a Chrome killer. (To be fair, Chrome was nice about shutting down a tab, and it was more the fact that I have a plugin that recognizes and auto formats JSON strings.) So that's not good. What about converting the XML into native ColdFusion structure?
Turns out there is a UDF for that - xmlToJson. It makes use of a XSLT transformation to create JSON. Perfect! And appropriately geeky too, right? I mean, how often do you get a chance to tell folks you performed an XSLT transformation today? Try it at the bar next time. It's a sure win. Here's the template I created to test this (note, the UDF is rather large, so I cut out the innards):
<cfset xmlFile = expandPath("./Applications.xml")>
<cfset xmlData = xmlParse(xmlFile)>
<cffunction name="xmlToJson" output="false" returntype="any" hint="convert xml to JSON">
</cffunction>
<cfset json = xmlToJson(xmlData)>
<cfdump var="#deserializeJSON(json)#">
This worked rather fast - shockingly fast actually - but it throws an error when deserializing. The transformation encountered this - Value=".0". This created a JSON string something like this - "Value":.0. That's not valid JSON. In theory I could have done some string parsing, but that felt dirty, so I went to approach 3: manually creating a structure. I designed this UDF:
function xmlToStruct(xml x) {
var s = {};
if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);
}
if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) {
s.attributes = {};
for(var item in x.xmlAttributes) {
s.attributes[item] = x.xmlAttributes[item];
}
}
if(structKeyExists(x, "xmlChildren")) {
for(var i=1; i<=arrayLen(x.xmlChildren); i++) {
if(structKeyExists(s, x.xmlchildren[i].xmlname)) {
if(!isArray(s[x.xmlChildren[i].xmlname])) {
var temp = s[x.xmlchildren[i].xmlname];
s[x.xmlchildren[i].xmlname] = [temp];
}
arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));
} else {
s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);
}
}
}
return s;
}
It handles creating a substructure for xml attributes and handling a case where you have 2-N xml children of the same name. (That was the toughest part.) Here's the complete template:
<cfscript>
function xmlToStruct(xml x) {
var s = {}; if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);
} if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) {
s.attributes = {};
for(var item in x.xmlAttributes) {
s.attributes[item] = x.xmlAttributes[item];
}
} if(structKeyExists(x, "xmlChildren")) {
for(var i=1; i<=arrayLen(x.xmlChildren); i++) {
if(structKeyExists(s, x.xmlchildren[i].xmlname)) {
if(!isArray(s[x.xmlChildren[i].xmlname])) {
var temp = s[x.xmlchildren[i].xmlname];
s[x.xmlchildren[i].xmlname] = [temp];
}
arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));
} else {
s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);
}
}
} return s;
} s = xmlToStruct(xmlData); </cfscript> <cfcontent reset="true" type="application/json"><cfoutput>#serializeJSON(s)#</cfoutput>
<cfset xmlFile = expandPath("./Applications.xml")>
<cfset xmlData = xmlParse(xmlFile)>
This worked rather fast (2-3 seconds for a 2 meg XML file), but we could make it even faster by cutting out the file read and parsing after we've done it one time:
<cfset xmlFile = expandPath("./Applications.xml")>
<cfset xmlData = xmlParse(xmlFile)> <cfscript>
function xmlToStruct(xml x) {
var s = {}; if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);
} if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) {
s.attributes = {};
for(var item in x.xmlAttributes) {
s.attributes[item] = x.xmlAttributes[item];
}
} if(structKeyExists(x, "xmlChildren")) {
for(var i=1; i<=arrayLen(x.xmlChildren); i++) {
if(structKeyExists(s, x.xmlchildren[i].xmlname)) {
if(!isArray(s[x.xmlChildren[i].xmlname])) {
var temp = s[x.xmlchildren[i].xmlname];
s[x.xmlchildren[i].xmlname] = [temp];
}
arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));
} else {
s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);
}
}
} return s;
} cachedJSON = serializeJSON(xmlToStruct(xmlData)); </cfscript> <cfset cachePut("jsonstr", cachedJSON)> </cfif> <cfcontent reset="true" type="application/json"><cfoutput>#cachedJSON#</cfoutput>
<cfset cachedJSON = cacheGet("jsonstr")>
<cfif isNull(cachedJSON)>
This version makes use of ColdFusion 9's built in caching to store the JSON string. On the first hit it takes about 3 seconds to render in the browser. After that it's almost immediate. The cache has no expiration, but it could be updated to timeout after a certain amount of time, or, you could note the date stamp on the file and store both that and the filename as a key to your cache.
Archived Comments
This is a really interesting post because I ran into some problems a while back while doing a POC where I get a SOAP response and push it into JSON for a PG app.
I ran into the same problems you were talking about with the XML, I did some hacky things to get around it because it was just a POC and a learning experiance but with what you have laid out here I think I might go back and use the UDF you created to make it better.
Also I never thought about using the caching like you did, I was just caching the query :)
Seems I need to read the docs more :) anyhow thanks for the post it should come in handy when I redo the app.
Did you consider WDDX'ing the XML and trying to serialize that? I wonder how that would do?
Ok, nevermind. Apparently you can't use cfwddx on XML. It kinda makes sense, since WDDX is XML, but I'd have thought it would have done something with it.
I had to do the same recently. and used this jQuery plugin: http://www.fyneworks.com/jq...
--- Ben
@Forta: Nice! That worked well. I'll have to remember that next time I have to deal with XML in JavaScript.
Ray love you work! But....this code doesn't really work. It never saves the XMLText to the struct at any point. You just receive the full structure but without any CDATA!
I'll take a look see. Can you share the XML data you used to test?
Hi Ray,
Works great up to a point. It appears that if the xmlChildren have children that the data and recursiveness stops.
That seems odd. Can you post a Gist with a full example of the bug I can try running?
I just sent a private message with the XML in it. Let me know if that does not suffice. The variables section seem to not be populated with values. I'm working on a different solution also.
Much thanks
Tim
I *think*I got it. Try this one:
https://gist.github.com/458...
Hi mate, first of all nice udf, i was looking for this function and u made it, thx but what do u have against the keyword "else" ?:) two else are missing that broke up all ur algorithm.
first of all: we are on the xmlroot or not, impossible to handle it only with a simple if but with a if root_node else not a root
Secondly, iam an array or not so if iam not an array... else iam an array.
Think about update ur tutorial and the pasted code with the new code from github, it will help people.
@SaezChristopher on twitter
Dude... I have no idea what you're saying. :) Can you say that again? Are you saying the github version is good?
i have to correct myself, we have to remove the else on the array but we still have to add else on root node test
Um... so maybe you could fork the gist with a correction and it would make more sense?
I finally found where was the last bug on this code: u have to add a finally condition on the algo: "if iam on a xmlText then return the text (the result of arrayAppend(..., xmlToStruct(s)) the test case was if u have to following same node inside a parent node: one got the right value :text, the second an empty struct instead of text as well.
I still can't grok your English. Any chance you could post your code as a new Gist? Or fork mine so we can see the mods?
sorry for my bad english, i quite late here, anyways, iam a noob on githut stuff so i let u a comment on ur code with my correction, it will be better than a frenglish explanation.
The version on Gist is fantastic. XML files that were 1.5mb are shrinking down to 500k with this simple function. Thanks for hooking it up as usual.
Hi
Here is an xml to json converter:
http://coursesweb.net/javas...
It is a javascript object that returns json string, or object, directly from file or string with xml content.
Hi there. Thanks for your blog and for this post in particular. I've taken the version you placed on Github and I "almost" have it working the way that I need it to. Once exception though that maybe you can help me out with.
I have JSON formatted output as a result of my conversion of XML data. I found that I needed to specifically include the "element" notation in creating my XML in order to have the structure of the file properly created (must have to do with the way that I'm dynamically looping and creating my XML). However, the actual JSON formatted output structure that I need does not include the "element" notation. Somehow I need to drop the "element" notation from my JSON... sort of "collapse" it I guess. (Again, I'm trying to avoid rewriting the code that creates the XML and instead, be able to convert my XML to the JSON that I need.)
Here's what I'm getting:
_
JSON
productCategoriesList
element
{}
id=1
name=product type A
products
element
{}
id=A1
name=product A1
{}
id=A2
name=product A2
{}
id=2
name=product type B
products
element
{}
id=B1
name=product B1
_
etc.
This is what I need:
_
JSON
productCategoriesList
{}
id=1
name=product type A
products
{}
id=A1
name=product A1
{}
id=A2
name=product A2
{}
id=2
name=product type B
products
{}
id=B1
name=product B1
_
etc.
Thoughts?
Thanks!
Can you share a Gist of your XML?
Sure thing. Here you go.
https://gist.github.com/red...
Thanks in advance for any help you can throw this way!
Ok, so this confuses me:
"However, the actual JSON formatted output structure that I need does not include the "element" notation. Somehow I need to drop the "element" notation from my JSON"
You say it doesn't include, but then you want to remove it. Did you just mispeak there?
Honestly, looking at the structure of your XML, you need *something* to separate your items. For example, the products tag has N elements inside representing each product.
It sounds like you want to iterate over products as opposed to products.element, but I'd probably just live with it. ;)
Hi again. Thanks for your reply.
There's a really good chance that I misspoke somewhere, but let me try again.
- My XML HAS the "element" notation. Let's say that it has to.
- As such, my auto-converted JSON also HAS the "element" notation. Makes sense.
- Challenge is, the entity consuming my JSON can't stomach the "element" notation... it wants what we see in file #3 on that github. If I feed that entity the JSON in file #3, it's happy. If I feed it the auto-converted JSON in file #2 it pukes on it.
The confusion here could very well be because the solution is an easy one, but it's one that eludes me. Perhaps I just need to iterate through the auto-generated JSON and "strip out" the "element" notation, but I guess I was hoping to somehow save that step.
Thanks again.
Oh - you are having a problem with the thing consuming your JSON. Well, in your case you may need to write your own solution. That should actually be easier - my code was built to handle *anything* whereas you know what your XML is and can write code specific for it.
Has anyone tried the new XML converter? http://www.coolutils.com/To...
On the latest git version linked in the comments, if I run this against a BBC news feed, the root node is returned twice.
I presume this is cause it is an if rather than an else if, and therefore the root node is always picked up both by the first check and at least one of the subsequent checks
if i remove the first check entirely:
if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);
}
Then it seems to work as expected. can anyone confirm?
Thanks Raymond, Greetings from Turkey