Using DevTools to Scrape Web Content

This post is more than 2 years old.

So yesterday I blogged a demo that was - by my own admission - somewhat silly and not really worth your time to read. However, I was thinking later that there was one particular aspect of how I built that demo that may be actually be useful.

While I was creating the demo, I needed to get a list of all the songs the Cure recorded. I found this quickly enough on Wikipedia:

Screen shot of Wikipedia page

So there's only 67 songs there - in theory I could have typed that in about 5 minutes. But why do something by hand when you can use code?!?!?

I began by right clicking on the first link and selecting "Inspect Element." (As a quick FYI, I'm using Firefox for this, but everything I'm showing should work in every modern browser. And shoot - I just tested and it's not supported in Edge. Tsk tsk.)

Screen shot of devtools focused on the link tag

It may be a bit hard to see in the screen shot, but I noticed two things here. First, the link used a title attribute with the name of the song. Second, I noticed there was a div named mw-category that appeared to "wrap" all the links. I figured this out by mousing over the div in the Inspector panel and noticing the highlight above.

Screen shot of devtools showing the div highlighted

Cool. So now I switched to the Console. For my first command, I wanted to grab all the links within that div:

links = document.querySelectorAll('.mw-category a');

When it was done, I tested to see if it seemed right by checking the length:

Confirming I got the right data

Notice how I got 67 items and it matches what the Wikipedia page says as well. Cool! So, now I've got a NodeList of data that I can iterate over like an array. (It isn't an array, but I can use it as such.) So first I made a new array:

titles = [];

And then I populated it:

links.forEach((a) => titles.push(a.title));

And when done, I took a quick look to ensure it seemed ok:

Testing the titles value

Cool! And for the final operation, I simply copied it to my clipboard using:


This is the only part that is not supported by Edge. Hopefully they add that soon. The end result is a string version of the array I was able to drop right into my editor and go to town with.

If any of the following didn't make sense, I've created a quick video showing the process I went through.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA

Archived Comments

Comment 1 by Robert Zehnder posted on 1/17/2018 at 9:33 PM

Honestly that is pretty cool, I would have totally over complicated it.

Comment 2 (In reply to #1) by Raymond Camden posted on 1/17/2018 at 9:55 PM

Thanks. :)

Comment 3 by Phillip Senn posted on 1/19/2018 at 8:34 PM

You are a JavaScript Ninja. Master said "When you can scrape the data from my website, it will be time for you to leave".

Comment 4 (In reply to #3) by Raymond Camden posted on 1/20/2018 at 2:03 PM

Hah :)