Converting XML to JSON - My exploration into madness...

This post is more than 2 years old.

Forgive the overly dramatic blog title - just having a bit of fun. A coworker asked me yesterday how one could take a flat XML file and serve it up as JSON. I told him I'd whip up a "quick example" for him and blog it to share with others. Smirking a bit to myself, I imagined I'd be done in 5 minutes and it would be a bit of a fluff piece. My ego ran head first into a brick wall rather quickly - which - I will admit - was kind of fun. Here's what I discovered.

For the hell of it, I thought, why not simply read in the XML, parse it, and serialize it?

<cfset xmlFile = expandPath("./Applications.xml")> <cfset xmlData = xmlParse(xmlFile)>

<cfset jsonData = serializeJSON(xmlData)> <cfoutput>#jsonData#</cfoutput>

Looks simple enough, right? However, it appears that serializeJSON takes the 'toString' version of xmlData. Even though it's a ColdFusion XML variable, it's first converted to a string and then passed to serializeJSON. So basically you get a large JSON encoded string. Or as I call it - a Chrome killer. (To be fair, Chrome was nice about shutting down a tab, and it was more the fact that I have a plugin that recognizes and auto formats JSON strings.) So that's not good. What about converting the XML into native ColdFusion structure?

Turns out there is a UDF for that - xmlToJson. It makes use of a XSLT transformation to create JSON. Perfect! And appropriately geeky too, right? I mean, how often do you get a chance to tell folks you performed an XSLT transformation today? Try it at the bar next time. It's a sure win. Here's the template I created to test this (note, the UDF is rather large, so I cut out the innards):

<cfset xmlFile = expandPath("./Applications.xml")> <cfset xmlData = xmlParse(xmlFile)>

<cffunction name="xmlToJson" output="false" returntype="any" hint="convert xml to JSON">

</cffunction>

<cfset json = xmlToJson(xmlData)> <cfdump var="#deserializeJSON(json)#">

This worked rather fast - shockingly fast actually - but it throws an error when deserializing. The transformation encountered this - Value=".0". This created a JSON string something like this - "Value":.0. That's not valid JSON. In theory I could have done some string parsing, but that felt dirty, so I went to approach 3: manually creating a structure. I designed this UDF:

function xmlToStruct(xml x) { var s = {};

if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
	s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);	
}

if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) { 
	s.attributes = {};
	for(var item in x.xmlAttributes) {
		s.attributes[item] = x.xmlAttributes[item];		
	}
}

if(structKeyExists(x, "xmlChildren")) {
	for(var i=1; i&lt;=arrayLen(x.xmlChildren); i++) {
		if(structKeyExists(s, x.xmlchildren[i].xmlname)) { 
			if(!isArray(s[x.xmlChildren[i].xmlname])) {
				var temp = s[x.xmlchildren[i].xmlname];
				s[x.xmlchildren[i].xmlname] = [temp];
			}
			arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));				
		 } else {
			s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);		 	 
		 }
	}
}

return s;

}

It handles creating a substructure for xml attributes and handling a case where you have 2-N xml children of the same name. (That was the toughest part.) Here's the complete template:

<cfset xmlFile = expandPath("./Applications.xml")> <cfset xmlData = xmlParse(xmlFile)>

<cfscript> function xmlToStruct(xml x) { var s = {};

if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
	s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);	
}

if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) { 
	s.attributes = {};
	for(var item in x.xmlAttributes) {
		s.attributes[item] = x.xmlAttributes[item];		
	}
}

if(structKeyExists(x, "xmlChildren")) {
	for(var i=1; i&lt;=arrayLen(x.xmlChildren); i++) {
		if(structKeyExists(s, x.xmlchildren[i].xmlname)) { 
			if(!isArray(s[x.xmlChildren[i].xmlname])) {
				var temp = s[x.xmlchildren[i].xmlname];
				s[x.xmlchildren[i].xmlname] = [temp];
			}
			arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));				
		 } else {
			s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);		 	 
		 }
	}
}

return s;

}

s = xmlToStruct(xmlData);

</cfscript>

<cfcontent reset="true" type="application/json"><cfoutput>#serializeJSON(s)#</cfoutput>

This worked rather fast (2-3 seconds for a 2 meg XML file), but we could make it even faster by cutting out the file read and parsing after we've done it one time:

<cfset cachedJSON = cacheGet("jsonstr")> <cfif isNull(cachedJSON)>

&lt;cfset xmlFile = expandPath("./Applications.xml")&gt;
&lt;cfset xmlData = xmlParse(xmlFile)&gt;

&lt;cfscript&gt;
function xmlToStruct(xml x) {
	var s = {};
	
	if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
		s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);	
	}

	if(structKeyExists(x, "xmlAttributes") && !structIsEmpty(x.xmlAttributes)) { 
		s.attributes = {};
		for(var item in x.xmlAttributes) {
			s.attributes[item] = x.xmlAttributes[item];		
		}
	}
	
	if(structKeyExists(x, "xmlChildren")) {
		for(var i=1; i&lt;=arrayLen(x.xmlChildren); i++) {
			if(structKeyExists(s, x.xmlchildren[i].xmlname)) { 
				if(!isArray(s[x.xmlChildren[i].xmlname])) {
					var temp = s[x.xmlchildren[i].xmlname];
					s[x.xmlchildren[i].xmlname] = [temp];
				}
				arrayAppend(s[x.xmlchildren[i].xmlname], xmlToStruct(x.xmlChildren[i]));				
			 } else {
				s[x.xmlChildren[i].xmlName] = xmlToStruct(x.xmlChildren[i]);		 	 
			 }
		}
	}
	
	return s;
}

cachedJSON = serializeJSON(xmlToStruct(xmlData));

&lt;/cfscript&gt;

&lt;cfset cachePut("jsonstr", cachedJSON)&gt;

</cfif>

<cfcontent reset="true" type="application/json"><cfoutput>#cachedJSON#</cfoutput>

This version makes use of ColdFusion 9's built in caching to store the JSON string. On the first hit it takes about 3 seconds to render in the browser. After that it's almost immediate. The cache has no expiration, but it could be updated to timeout after a certain amount of time, or, you could note the date stamp on the file and store both that and the filename as a key to your cache.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Mike posted on 1/4/2012 at 10:25 PM

This is a really interesting post because I ran into some problems a while back while doing a POC where I get a SOAP response and push it into JSON for a PG app.

I ran into the same problems you were talking about with the XML, I did some hacky things to get around it because it was just a POC and a learning experiance but with what you have laid out here I think I might go back and use the UDF you created to make it better.

Also I never thought about using the caching like you did, I was just caching the query :)

Seems I need to read the docs more :) anyhow thanks for the post it should come in handy when I redo the app.

Comment 2 by Todd Sharp posted on 1/4/2012 at 10:33 PM

Did you consider WDDX'ing the XML and trying to serialize that? I wonder how that would do?

Comment 3 by Todd Sharp posted on 1/4/2012 at 11:05 PM

Ok, nevermind. Apparently you can't use cfwddx on XML. It kinda makes sense, since WDDX is XML, but I'd have thought it would have done something with it.

Comment 4 by Ben Forta posted on 1/6/2012 at 11:26 PM

I had to do the same recently. and used this jQuery plugin: http://www.fyneworks.com/jq...

--- Ben

Comment 5 by Raymond Camden posted on 1/6/2012 at 11:58 PM

@Forta: Nice! That worked well. I'll have to remember that next time I have to deal with XML in JavaScript.

Comment 6 by Neil Pugh posted on 8/22/2012 at 4:00 PM

Ray love you work! But....this code doesn't really work. It never saves the XMLText to the struct at any point. You just receive the full structure but without any CDATA!

Comment 7 by Raymond Camden posted on 8/22/2012 at 4:07 PM

I'll take a look see. Can you share the XML data you used to test?

Comment 8 by Tim Meade posted on 1/19/2013 at 9:55 PM

Hi Ray,

Works great up to a point. It appears that if the xmlChildren have children that the data and recursiveness stops.

Comment 9 by Raymond Camden posted on 1/20/2013 at 12:02 AM

That seems odd. Can you post a Gist with a full example of the bug I can try running?

Comment 10 by Tim Meade posted on 1/20/2013 at 12:07 AM

I just sent a private message with the XML in it. Let me know if that does not suffice. The variables section seem to not be populated with values. I'm working on a different solution also.

Much thanks

Tim

Comment 11 by Raymond Camden posted on 1/20/2013 at 11:18 PM

I *think*I got it. Try this one:

https://gist.github.com/458...

Comment 12 by chrishunterkiller posted on 1/23/2013 at 3:37 AM

Hi mate, first of all nice udf, i was looking for this function and u made it, thx but what do u have against the keyword "else" ?:) two else are missing that broke up all ur algorithm.
first of all: we are on the xmlroot or not, impossible to handle it only with a simple if but with a if root_node else not a root
Secondly, iam an array or not so if iam not an array... else iam an array.
Think about update ur tutorial and the pasted code with the new code from github, it will help people.

@SaezChristopher on twitter

Comment 13 by Raymond Camden posted on 1/23/2013 at 3:41 AM

Dude... I have no idea what you're saying. :) Can you say that again? Are you saying the github version is good?

Comment 14 by chrishunterkiller posted on 1/23/2013 at 3:52 AM

i have to correct myself, we have to remove the else on the array but we still have to add else on root node test

Comment 15 by Raymond Camden posted on 1/23/2013 at 3:55 AM

Um... so maybe you could fork the gist with a correction and it would make more sense?

Comment 16 by chrishunterkiller posted on 1/23/2013 at 4:22 AM

I finally found where was the last bug on this code: u have to add a finally condition on the algo: "if iam on a xmlText then return the text (the result of arrayAppend(..., xmlToStruct(s)) the test case was if u have to following same node inside a parent node: one got the right value :text, the second an empty struct instead of text as well.

Comment 17 by Raymond Camden posted on 1/23/2013 at 4:23 AM

I still can't grok your English. Any chance you could post your code as a new Gist? Or fork mine so we can see the mods?

Comment 18 by chrishunterkiller posted on 1/23/2013 at 4:33 AM

sorry for my bad english, i quite late here, anyways, iam a noob on githut stuff so i let u a comment on ur code with my correction, it will be better than a frenglish explanation.

Comment 19 by Matt posted on 4/27/2013 at 7:16 AM

The version on Gist is fantastic. XML files that were 1.5mb are shrinking down to 500k with this simple function. Thanks for hooking it up as usual.

Comment 20 by CoursesWeb posted on 9/30/2013 at 4:50 PM

Hi
Here is an xml to json converter:
http://coursesweb.net/javas...
It is a javascript object that returns json string, or object, directly from file or string with xml content.

Comment 21 by Darcy posted on 12/12/2013 at 4:42 AM

Hi there. Thanks for your blog and for this post in particular. I've taken the version you placed on Github and I "almost" have it working the way that I need it to. Once exception though that maybe you can help me out with.

I have JSON formatted output as a result of my conversion of XML data. I found that I needed to specifically include the "element" notation in creating my XML in order to have the structure of the file properly created (must have to do with the way that I'm dynamically looping and creating my XML). However, the actual JSON formatted output structure that I need does not include the "element" notation. Somehow I need to drop the "element" notation from my JSON... sort of "collapse" it I guess. (Again, I'm trying to avoid rewriting the code that creates the XML and instead, be able to convert my XML to the JSON that I need.)

Here's what I'm getting:
_
JSON
productCategoriesList
element
{}
id=1
name=product type A
products
element
{}
id=A1
name=product A1
{}
id=A2
name=product A2
{}
id=2
name=product type B
products
element
{}
id=B1
name=product B1
_
etc.

This is what I need:
_
JSON
productCategoriesList
{}
id=1
name=product type A
products
{}
id=A1
name=product A1
{}
id=A2
name=product A2
{}
id=2
name=product type B
products
{}
id=B1
name=product B1
_
etc.

Thoughts?

Thanks!

Comment 22 by Raymond Camden posted on 12/12/2013 at 8:52 AM

Can you share a Gist of your XML?

Comment 23 by Darcy posted on 12/12/2013 at 8:13 PM

Sure thing. Here you go.

https://gist.github.com/red...

Thanks in advance for any help you can throw this way!

Comment 24 by Raymond Camden posted on 12/12/2013 at 9:49 PM

Ok, so this confuses me:

"However, the actual JSON formatted output structure that I need does not include the "element" notation. Somehow I need to drop the "element" notation from my JSON"

You say it doesn't include, but then you want to remove it. Did you just mispeak there?

Honestly, looking at the structure of your XML, you need *something* to separate your items. For example, the products tag has N elements inside representing each product.

It sounds like you want to iterate over products as opposed to products.element, but I'd probably just live with it. ;)

Comment 25 by Darcy posted on 12/12/2013 at 10:02 PM

Hi again. Thanks for your reply.

There's a really good chance that I misspoke somewhere, but let me try again.

- My XML HAS the "element" notation. Let's say that it has to.

- As such, my auto-converted JSON also HAS the "element" notation. Makes sense.

- Challenge is, the entity consuming my JSON can't stomach the "element" notation... it wants what we see in file #3 on that github. If I feed that entity the JSON in file #3, it's happy. If I feed it the auto-converted JSON in file #2 it pukes on it.

The confusion here could very well be because the solution is an easy one, but it's one that eludes me. Perhaps I just need to iterate through the auto-generated JSON and "strip out" the "element" notation, but I guess I was hoping to somehow save that step.

Thanks again.

Comment 26 by Raymond Camden posted on 12/13/2013 at 5:58 PM

Oh - you are having a problem with the thing consuming your JSON. Well, in your case you may need to write your own solution. That should actually be easier - my code was built to handle *anything* whereas you know what your XML is and can write code specific for it.

Comment 27 by Felinotherapist posted on 7/16/2014 at 3:05 AM

Has anyone tried the new XML converter? http://www.coolutils.com/To...

Comment 28 by Tim posted on 2/17/2015 at 5:41 PM

On the latest git version linked in the comments, if I run this against a BBC news feed, the root node is returned twice.

I presume this is cause it is an if rather than an else if, and therefore the root node is always picked up both by the first check and at least one of the subsequent checks

if i remove the first check entirely:

if(xmlGetNodeType(x) == "DOCUMENT_NODE") {
s[structKeyList(x)] = xmlToStruct(x[structKeyList(x)]);
}

Then it seems to work as expected. can anyone confirm?

Comment 29 by Semih Akartuna posted on 7/15/2020 at 9:27 AM

Thanks Raymond, Greetings from Turkey