Detecting invalid HTML with JavaScript

As a blogger, I write quite a few blog posts. I hate RTEs (Rich Text Editors) so I’ll typically do most of any desired HTML by hand. Normally this isn’t a big deal. My blogware can handle paragraphs and code formatting. I typically just worry about bold and italics. However, because I’m entering HTML manually, there’s always a chance I could screw up. I’ve got a Preview feature on my blog but I rarely use it.

For a while now I’ve wondered if there was some way to possible detect bad HTML via JavaScript. I decided today to take a crack at it using some simple regex. I figured if we could detect all tags, maybe we could use a simple counter to keep track of opening and closing tags. Obviously that’s not terribly precise, but for the types of mistakes I make, it would actually work out ok most of the time. I worked on it a bit and came up with the following little demo:

<html> <head> <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script> <script> $(document).ready(function() {

$("#testBtn").click(function(e) {
	var code = $.trim($("#code").val());
	if(code == '') return;
	
	var regex = /&lt;.*?&gt;/g;
	var matches = code.match(regex);
	if(!matches.length) return;
	
	var tags = {};
	
	$.each(matches, function(idx,itm) {
		console.log("Raw tag: "+itm);

		//if the tag is, &lt;..../&gt;, it's self closing
		if (itm.substr(itm.length - 2, itm.length) != "/&gt;") {
		
			//strip out any attributes
			var tag = itm.replace(/[&lt;&gt;]/g, "").split(" ")[0];
			console.log("Tag : " + tag);
			//start or end tag?
			if (tag.charAt(0) != "/") {
				if (tags.hasOwnProperty(tag)) 
					tags[tag]++;
				else 
					tags[tag] = 1;
			}
			else {
				var realTag = tag.substr(1, tag.length);
				console.log("Real tag is -" + realTag);
				if (tags.hasOwnProperty(realTag)) 
					tags[realTag]--;
				else 
					tags[realTag] = -1;
			}
		}
	});

	console.dir(tags);
	
	var possibles = [];
	for (tag in tags) {
		if(tags[tag] != 0) possibles.push(tag);
	}
	if (possibles.length) {
		$("#status").text("There appear to be some hanging tags in your textarea: "+possibles.join(","));
	}
});

}); </script> </head>

<body>

<div id=”status”></div>

<form> <textarea name=”code” id=”code” cols=”70” rows=”30”></textarea><br/> <input type=”button” id=”testBtn” value=”Test”> </form> </body> </html> </code>

Basically, I used a simple regex to find any HTML tag:

var regex = /<.*?>/g;

And from that, I loop over the matches and figure out a) the real tag (so I ignore attributes for example) and if it is closing or not. I use a simple numeric value to either increment/decrement a counter of tags. I also try to support self closing tags like <p/>.

It’s not the most scientific method, but it seems to work well in my testing. Check it out at the demo below.

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate looking for his next gig. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support.

Lafayette, LA https://www.raymondcamden.com

Comments