One of the cooler new features in the next version of ColdFusion is getSafeHTML. I had seen this mentioned a few times already but it never really clicked in my brain what it was doing. getSafeHTML makes use of the AntiSamy project. It takes user-generated content and replaces unsafe HTML. What is safe and what isn't? It is totally up to you. The functionality is driven by an XML file (a very complex XML file) that lets you get as granular as you want. Want to support the bold tag but not italics? Fine. Want to support colors for CSS but only some? You can do that. Let's look at a simple example - and one that happens to point out a little issue.
<cfsavecontent variable="test">
This is some <b>html</b>. Even <i>more</i> html!<br/>
<iframe src="http://www.cnn.com"></iframe>
</cfsavecontent>
<cfoutput>
<pre>
#getSafeHTML(test)#
</pre>
</cfoutput>
In my sample input, I've got a B tag, an I tag, and an iframe. getSafeContent will strip out just the iframe, leaving the bold and italics there. This is rather cool I think. But in my test I discovered a little bug. The actual result of the above code is this:
This is some <b>html</b>
. Even <i>more</i>
html!<br />
See the line break after the closing B tag? That moves the period to a new line, which renders as a space in the browser. I did some research and discovered that there is a particular setting in AntiSamy that modifies the result with a bit of formatting. In this case, the formatting breaks my HTML. So how to fix?
As I mentioned, AntiSamy is configured by an XML file. There is a default one for the server. You can override the XML file at the Application.cfc level or in your call to getSafeHTML itself. I did some Googling, found a sample file, and then did the modification to the setting in question:
<directive name="formatOutput" value="false"/>
This goes within the directives block. I'm going to file an ER to add this to the default XML for ColdFusion 11.
Archived Comments
I hope getSafeHTML() doesn't use xmlSearch() to parse the "very complex XML file" on each call, otherwise it may not be thread safe: http://cfmlblog.adamcameron...
Does the CF instance need a restart to pick up the changes made to the XML?
Thanks Ray
I modified the XML file and the change was picked up immediately.
Thanks for writing about this.
Would this function also check for user generated Content that contains JavaScript and checks for potentionaly harmful JavaScript functions?
I haven't tested it with JS, but I'd have to imagine the default is to strip ALL js. I can't imagine a scenario where you would want to allow JS. As an example, this blog lets you fake bold/italics for comments, which I think makes sense for user generated content. But I'd never allow JS here.
Of course - you *could* test this yourself and report back. ;)
Thanks for responding. I guess i'll have to Setup a box and test it :)
In a blog etc it would not make much sense to allow JS. But sites like "ebay" etc allow users to upload very complex html desciption for products that include js for e.g. nice Image switching etc. For that a good filter would be great to allow some js but not all.
Based on what I saw - it looks like the XML format for AntiSamy is VERY powerful, but I'd be shocked if it let you support JS and only certain functions. As it stands, in JS you can generate a function via a string, so if you blocked alert in the input string, I could still execute an alert by generating a function by hand.
Right. That s what i am thinking.
Looking for something that could help me with the Task of securing JS in user generated Content ... ... Would have been to easy to have a function like getSaveHTML to solve my Problems *g