Detecting invalid HTML with JavaScript

As a blogger, I write quite a few blog posts. I hate RTEs (Rich Text Editors) so I'll typically do most of any desired HTML by hand. Normally this isn't a big deal. My blogware can handle paragraphs and code formatting. I typically just worry about bold and italics. However, because I'm entering HTML manually, there's always a chance I could screw up. I've got a Preview feature on my blog but I rarely use it.

For a while now I've wondered if there was some way to possible detect bad HTML via JavaScript. I decided today to take a crack at it using some simple regex. I figured if we could detect all tags, maybe we could use a simple counter to keep track of opening and closing tags. Obviously that's not terribly precise, but for the types of mistakes I make, it would actually work out ok most of the time. I worked on it a bit and came up with the following little demo:

<html> <head> <script type="text/javascript" src=""></script> <script> $(document).ready(function() {

$("#testBtn").click(function(e) {
    var code = $.trim($("#code").val());
    if(code == '') return;

    var regex = /&lt;.*?&gt;/g;
    var matches = code.match(regex);
    if(!matches.length) return;

    var tags = {};

    $.each(matches, function(idx,itm) {
        console.log("Raw tag: "+itm);

        //if the tag is, &lt;..../&gt;, it's self closing
        if (itm.substr(itm.length - 2, itm.length) != "/&gt;") {

            //strip out any attributes
            var tag = itm.replace(/[&lt;&gt;]/g, "").split(" ")[0];
            console.log("Tag : " + tag);
            //start or end tag?
            if (tag.charAt(0) != "/") {
                if (tags.hasOwnProperty(tag)) 
                    tags[tag] = 1;
            else {
                var realTag = tag.substr(1, tag.length);
                console.log("Real tag is -" + realTag);
                if (tags.hasOwnProperty(realTag)) 
                    tags[realTag] = -1;


    var possibles = [];
    for (tag in tags) {
        if(tags[tag] != 0) possibles.push(tag);
    if (possibles.length) {
        $("#status").text("There appear to be some hanging tags in your textarea: "+possibles.join(","));

}); </script> </head>


<div id="status"></div>

<form> <textarea name="code" id="code" cols="70" rows="30"></textarea><br/> <input type="button" id="testBtn" value="Test"> </form>
</body> </html>

Basically, I used a simple regex to find any HTML tag:

var regex = /<.*?>/g;

And from that, I loop over the matches and figure out a) the real tag (so I ignore attributes for example) and if it is closing or not. I use a simple numeric value to either increment/decrement a counter of tags. I also try to support self closing tags like <p/>.

It's not the most scientific method, but it seems to work well in my testing. Check it out at the demo below.

Like This?

If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can also subscribe to the email feed to get notified of new posts.