August 2, 2012 (This post is more than 2 years old.)

Using jQuery to load HTML and filter it by N selectors

javascript jquery

Forgive the somewhat awkward title. Hopefully an explanation will make things a bit clearer. I was working on an application yesterday that needed to load in a HTML file via AJAX and display it on screen. The HTML happened to be documentation so I was going to simply display it as is on screen. Since I wasn't doing any processing, my code was very simple:

Easy, right? Well, the first thing I discovered was that the HTML I was loading included things I didn't want - headers, footers, etc. Again though this is easy enough to handle. You can tell jQuery's load() function to filter down to a DOM item. (As a reminder - if you are concerned about performance don't forget that you are still asking jQuery to load N bytes of HTML even though you are using <N bytes in the display.)

Woot. Almost there. This worked great, but the "block" of HTML this rendered was missing a nice header on top. I went back to the original source HTML and discovered that there was another div, header, that contained the title and would be perfect.

But here was a problem. How do I tell the load() function to select two DOM items? Turns out this was easy as well - just provide a list:

This worked fine. But this leads to my question. Is this a good idea? Is there a better way? (Assuming you can't get "pure" data and must work with the HTML files.)

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Dan G. Switzer, II posted on 8/2/2012 at 6:09 PM

@Raymond:

While this code is super concise, I don't find it intuitive at all. I prefer to use a complete call back and spell out the code.

Also, keep in mind that if the source document DOM changes, this code could break. At bare minimum, I'd carefully comment what the code is supposed to do.

Comment 2 by Raymond Camden posted on 8/2/2012 at 6:13 PM

I'm going to ignore your second comment because, as I said, this was the source HTML and it would obviously be better if I had pure data. That just isn't an option for now. ;)

To your first one... ok - so given you decide to switch to $.get, or $.ajax. Given you have X which is the result HTML. How do you get N nodes? I had found this SO question:

http://stackoverflow.com/qu...

But it didn't work well for me. Using $(data).find('a') works, but not $(data).find('#id')

Comment 3 by m13z posted on 8/2/2012 at 9:27 PM

In jQuery you can use a context instead of a filter:

$('#header, #docs', data)

Maybe that works better?

Comment 4 by Raymond Camden posted on 8/2/2012 at 10:32 PM

The issue I had with that is that it seems to 'execute' data, and if there are syntax errors in the DOM, like trying to use a script it can't reach, then I get errors in the console.

Comment 5 by Tim Leach posted on 8/2/2012 at 10:45 PM

@M13z

FYI,
Doing $('#header, #docs', data) is just a shortcut for
$(data).find('#header, #docs')

Both will execute the same under the hood.

Comment 6 by Elijah Manor posted on 8/2/2012 at 11:21 PM

Raymond,

What errors are you seeing in the console? I'm going to take a guess, but are you seeing a Permission Denied error? Behind the scenes the .load() method is using $.ajax() to do it's work. Internally it uses a find, but before it does that it removes any scripts that can cause problems.

What site where you trying to get markup from? If you point me to that I can run a test in the console of that site to see what you are running into. Is the site public?

Comment 7 by Raymond Camden posted on 8/2/2012 at 11:22 PM

It is a "bit" private as in I'm using it for a demo at a keynote on Monday. It isn't really important though in terms of being top secret. Give me a bit to get a 'demo' of what I saw live, or at least get more of the error.

Comment 8 by Raymond Camden posted on 8/2/2012 at 11:27 PM

Ok, some info. First off, you can see a sample of the HTML source here:

http://www.raymondcamden.co...

I modded my code to this, just for testing:

$.get("cfdocs/"+url, {}, function(res,code) {
console.log('ready');
var header = $("#header", res);
console.dir(header);
var header2 = $("#header", $(res));
console.dir(header2);
});

As you can see, I wasn't sure if I needed to jQuery-wrap res or not. But running the above, I get no matches, even though #header is cleary part of the dom.

Comment 9 by m13z posted on 8/2/2012 at 11:42 PM

replace the "res" context with:

$('<div />').append(res.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, ''))

That's basically what .load() does internally.

Comment 10 by Raymond Camden posted on 8/2/2012 at 11:45 PM

Did you mean:

var header3 = $("#header", $('<div />').append(res.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, ''));

Throwing a syntax error.

Comment 11 by Raymond Camden posted on 8/2/2012 at 11:46 PM

Sorry missing ). Testing.

Comment 12 by Raymond Camden posted on 8/2/2012 at 11:48 PM

That worked. So if I read it right, this is what you did:

Take the result HTML.
Remove any script block.
Append it to a virgin DIV block made on the fly.
Then run my selector against it.

Would you say that is an accurate description?

Comment 13 by m13z posted on 8/2/2012 at 11:52 PM

Yepp. Accurate.

Lines 7180 to 7187 of jQuery 1.7.2 for source.

Comment 14 by Raymond Camden posted on 8/2/2012 at 11:54 PM

So another take away from this is: You can always work with arbitrary HTML but you just need to remove script blocks first.

Thanks M13z!

Comment 15 by m13z posted on 8/3/2012 at 12:24 AM

¡De nada!

About the trigger of this discussion, I would normally agree with Dan about more control over the callback, but we have ended doing exactly what load() does internally, so the original code of the article is the better solution (It's exactly what jQuery was created for in the first place: "write less, do more").

Comment 16 by Kevin Boudloche posted on 8/3/2012 at 6:22 PM

jQuery 1.8 has a new method that may change the way this kind of processing is done. Look into the `$.parseHTML()` method. It takes a string of html and returns it as document fragment with or without scripts. line 485 of http://code.jquery.com/jque...

Comment 17 by Raymond Camden posted on 8/3/2012 at 6:26 PM

Oh man that's pretty cool. I have not been following the development of 1.8 much. I'll have to pay more attention.

Comment 18 by Dan G. Switzer, II posted on 8/6/2012 at 6:43 PM

@Raymond:

My comment about the source code changing, was more to the fact that if the source code changes the comments are important because you may not remember what "#header, #docs" are supposed to do. Commenting the code to say:

// get article header (#header) and the body of the article (#docs)

Should help to fix the error. This is really more a note for someone trying to do this kind of thing in production.

Support this Content!

Archived Comments

Webmentions