For the most part, those of us who use sites with user generated content, like this blog right here, will do everything possible to strip out and "sanitize" the content sent in by the public. This is mainly done as a security measure. While not enough by itself, htmlEditFormat can be used to strip out and block attempts to spam/misdirect users. Even when not worried about that, innocent users, if given the ability to inject HTML into your site, could easily make a simple HTML mistake that renders your site completely broken. (Visibly anyway.) While a few sites allow for basic HTML, in general you are stuck with simple plain text. Because of this many people will use a few common typographical symbols to convey meaning. For example, I may use footo symbolize boldness or strong feeling. I may also use underscores as a way to represent italics. Let's look at a simple example of how ColdFusion can render some of these into real HTML tags.

First, I'll create a simple form that renders a textarea.


<cfparam name="form.comments" default="">

<form method="post">
<cfoutput><textarea name="comments" style="width:300px;height:200px">#form.comments#</textarea></cfoutput><br/>
<input type="submit" value="Test">
</form>

<cfif len(trim(form.comments))>
	<cfoutput>#markupRender(form.comments)#</cfoutput>
	<p>
	<cfoutput>#htmlEditFormat(markupRender(form.comments))#</cfoutput>
</cfif>

The UDF, markupRender, doesn't exist yet, but notice how for this test I render both the result and the htmlEditFormat of the result. I wouldn't use the second output in production, but for testing, this lets me see the actual HTML rendered. Now let's look at the UDF:


<cfscript>
function markupRender(required string text) {
	text = htmlEditFormat(text);
	text = reReplace(text, "\*(.*?)\*", "<b>\1</b>","all");
	text = reReplace(text, "_(.*?)_", "<i>\1</i>","all");
	return text;
}
</cfscript>

Not much at all to it, is there? I first perform a quick htmlEditFormat on the text, and then follow up with two simple regular expressions. Notice that I have to escape * since it is a special character in regexes. Also note the use of .? - specifically the ?. The . means match anything 0 or more times, but the use of ? makes it non greedy. Why is that important? Consider this string: I love ColdFusion and I love Star Wars. Without the non-greedy mark, the match on (anything) would go from the beginning of the first love all the way to the end of the last one.

Want to test it? Give it a run here: http://www.coldfusionjedi.com/demos/may102011/test0.cfm Sorry - online demo no longer available.

How about we take it one step further? Imagine if the user enters something like this:

This *is* a *test*. I feel _strongly_ about this, I really do.

And I live at:

900 Elm Street
Lafayette, LA

As you can see, I've got 3 paragraphs with the third paragraph being an address. The built-in function paragraphFormat would handle the paragraphs in general, but would not handle the address being on two lines. Luckily there is a UDF at CFLib just for that - paragraphformat2.

Now in general, I think it's a good idea to keep your methods as simple and direct as possible. They should do one thing only. However, I think in this case putting all of my format logic together in one UDF makes sense as well. I could spend all day worrying about what is the best architecture for a simple UDF or I can just build it and be done with it. So I did. I took the guts of paragraphFormat2 and simply added it in:


<cfscript>
function markupRender(required string text) {
	text = htmlEditFormat(text);
	text = reReplace(text, "\*(.*?)\*", "<b>\1</b>","all");
	text = reReplace(text, "_(.*?)_", "<i>\1</i>","all");

	//Credit: Ben Forta and paragraphformat2: www.cflib.org/udf/paragraphformat2
	text = replace(text,chr(13)&chr(10),chr(10),"ALL");
	//now make Macintosh style into Unix style
	text = replace(text,chr(13),chr(10),"ALL");
	//now fix tabs
	text = replace(text,chr(9),"   ","ALL");
	//now return the text formatted in HTML
	return replace(text,chr(10),"<br />","ALL");
}
</cfscript>

That's barely over 10 lines and it now correctly handles paragraphs, line breaks, and tabs too.

Here is a complete copy of the template.


<cfparam name="form.comments" default="">

<form method="post">
<cfoutput><textarea name="comments" style="width:300px;height:200px">#form.comments#</textarea></cfoutput><br/>
<input type="submit" value="Test">
</form>

<cfif len(trim(form.comments))>
	<cfoutput>#markupRender(form.comments)#</cfoutput>
	<p>
	<cfoutput>#htmlEditFormat(markupRender(form.comments))#</cfoutput>
</cfif>

<cfscript>
function markupRender(required string text) {
	text = htmlEditFormat(text);
	text = reReplace(text, "\*(.*?)\*", "<b>\1</b>","all");
	text = reReplace(text, "_(.*?)_", "<i>\1</i>","all");

	//Credit: Ben Forta and paragraphformat2: www.cflib.org/udf/paragraphformat2
	text = replace(text,chr(13)&chr(10),chr(10),"ALL");
	//now make Macintosh style into Unix style
	text = replace(text,chr(13),chr(10),"ALL");
	//now fix tabs
	text = replace(text,chr(9),"   ","ALL");
	//now return the text formatted in HTML
	return replace(text,chr(10),"<br />","ALL");
}
</cfscript>