Working with the Disqus API - Deeper Stats

Working with the Disqus API - Deeper Stats

Yesterday I blogged about my first attempts at writing a client-side Disqus API client to provide better stats than the Disqus site itself. While yesterday’s demo was more a proof of concept, today I’m attempting something a bit deeper - the beginning of a real power tool.

This first iteration is somewhat ugly, but will serve as the basis for the pretty, client-friendly version I’ll work on next. The idea behind the tool is to create a complete (ish) copy of your Disqus data locally on your client. For that I decided to use IndexedDB. Eventually my code will handle fetching only new comments, but for now it sucks down everything. Even with the 1000 request per hour limit Disqus imposes, I can suck down the entirety of comments for this blog (over 60K). It takes a while, but it’s a one time hit, and again, going forward (in the next version at least), it will be a heck of a lot quicker as it only needs to get the latest comments. Let me start by showing the front end and then we’ll dive into the code. To be clear, the ‘front end’ is really just a few buttons and a heck of a lot of console logging.

On startup, you need to give it the name of your forum.

The setup button is responsible for creating the IndexedDB database for your forum. I could have used one db for all my testing, but… I don’t know. It just felt right to create one bucket of data per forum. Obviously I may revisit that.

After entering a value and clicking “Setup”, the next two buttons are activated. Get Data begins the data fetching process. I hit Disqus for 100 posts per request and just paginate like crazy. For my blog, it does about one request per second and needs 610 or so requests to finish. That’s like super slow, but again, will be a one time import. On the next version I’ll provide good feedback. I may even use the techniques in my last post to get a comment count via thread listings first so I can create a status bar. For me that’s going to add about 60 seconds to the process though and it may not be worth while. Again, the UX here is squishy - I’ll need feedback.

Display Data is where I start actually running reports. Right now all these reports are dumped to the console. My reports currently consist of:

  • Comment Count (to be clear, this is the same stat as yesterday, just fetched a different way)
  • The number of unique commenters
  • The first and last comment
  • Number of comments from 2003 to 2016. Yes, I hard coded it, but obviously this would be based on the previous values.
  • The threads with the most comments. (I actually sort them all, but I just print the top ten.)
  • The top ten authors by the number of comments. I’m thinking of adding a setting that lets you enter your own name so it can be ignored.

And here is how this looks. First, the ‘by year’ stats, comment range, and unique number of commenters.

And here are the top commenters:

And the threads with the most comments:

Ok, I know you’re overly impressed by the UI, but let’s take a look at the code.


function setupData() {
	forum = $forum.val();
	if($.trim(forum) === '') return;
	console.log('work with '+forum);
	$setupData.attr('disabled','disabled');

	initDb(function() {
		$results.html('<p><i>Db setup.</i></p>');
		$startData.removeAttr('disabled');
		$displayData.removeAttr('disabled');
	},forum);

}

function initDb(cb,forum) {
	/*
	Begin by creating an IDB name based on forum. This lets us have one db per forum
	*/
	var dbName = 'disqus_'+forum;
	var req = window.indexedDB.open(dbName, 1);

	req.onupgradeneeded = function(event) {
		console.log('initial db setup');
		var theDb = event.target.result;

		//create a store for posts
		var postOS = theDb.createObjectStore("posts", { keyPath:"id" });
		postOS.createIndex("created", "created", { unique: false});
		postOS.createIndex("authorName", "author.name", { unique: false});
		postOS.createIndex("thread", "thread.id", { unique: false});
		
	}

	req.onsuccess = function(event) {
		db = event.target.result;
		console.log('We made the db.');
		cb();
	}

	req.onerror = function(e) {
		console.log('Error setting up IDB db');
		console.dir(e);
	}

	
}

First is a utility handler for the setup button. It just does a bit of DOM crap, and then I have the code to setup my database. As I said, I’m using one db per forum and I may revisit that. This is boilerplate IDB crap. The only things of interest are the indexes. I want to be able to sort/filter by date, author, and unique threads, so I have to create an index for each.

Now let’s look at the seeding portion. Again, later this will do things like remembering where it left off and handling hitting the API limits, for now though it just sucks data. First - get the data.


function doPosts(cb, forum, cursor, posts) {
	var url = 'https://disqus.com/api/3.0/posts/list.json?forum='+encodeURIComponent(forum)+'&api_key='+key+'&limit=100&order=asc&related=thread';
	if(cursor) url += '&cursor='+cursor;
	if(!posts) posts = [];
	console.log('Fetching posts.');
	$.get(url).then(function(res) {
		res.response.forEach(function(t) {
			posts.push(t);
		});

		if(res.cursor && res.cursor.hasNext) {
			doPosts(cb, forum, res.cursor.next, posts);
		} else {
			cb(posts);
		}
	},'json');
}

Next, enter it into the db. Note I’m using put which will handle inserting or replacing, but really I’m just inserting once.


function seedData(cb) {

	doPosts(function(posts) {
		console.log('I get '+posts.length+' posts.');

		//open up the trans
		var trans = db.transaction(['posts'], 'readwrite');
		var store = trans.objectStore('posts');

		posts.forEach(function(p) {
			p.created = (new Date(p.createdAt)).getTime();
			var req = store.put(p);
			req.onerror = function(e) {
				console.log('add error', e);
			};
		});

		trans.oncomplete = function(e) {
			console.log('objects inserted');
			cb();
		}

		trans.onerror = function(e) {
			console.log('Error in transaction', e);
		}

	},forum);

}

The only thing really interesting there is I create a new data value based on the epoch time. If you don’t, the date value of createdAt gets inserted as a string you can’t sort on.

Ok, now let’s look at the stats. This is all in 3 or so really ugly functions, so I’m going to share a snippet at a time. Keep in mind I wrote this quick, and it’s kinda crappy, but it’s the first iteration.

First - the number of comments:


/*
number of posts
*/
posts.count().onsuccess = function(e) {
	var count = e.target.result;
	console.log(count + ' total posts');
}

Yeah, not too complex - just a count call on the objectStore. Lets kick it up a notch!


/*
unique authors
*/
var authors = [];
posts.index('authorName').openCursor(null,'nextunique').onsuccess = function(e) {
	var cursor = e.target.result;
	if(cursor) {
		//console.log('item', cursor.value.id, cursor.value.author.name);
		authors.push(cursor.value.author);
		cursor.continue();
	} else {
		console.log(authors.length + ' total authors');
		doAuthorStats(authors);
	}
}

This gives me both a count on authors and passes off an array of author objects I can then perform analysis on:


function doAuthorStats(authors) {
	console.log('doAuthorStats');

	//lame setup to handle knowing when we're done with the count, since its async and we don't have promises
	var totalAuthor = authors.length;
	var authorInfo = [];

	authors.forEach(function(author) {

		var trans = db.transaction(['posts'], 'readonly');
		var posts = trans.objectStore('posts');

		var range = IDBKeyRange.only(author.name);

		posts.index('authorName').count(range).onsuccess = function(e) {
			//console.log('result for '+author.name+' '+e.target.result);
			authorInfo.push({author:author, count:e.target.result});
			if(authorInfo.length === totalAuthor) doComplete();
		};

	});

	var doComplete = function() {
		authorInfo.sort(function(a,b) {
			if(a.count > b.count) return -1;
			if(a.count < b.count) return 1;
			return 0;
		});
		for(var i=0;i<10;i++) {
			console.log(authorInfo[i].author.name + ' with '+authorInfo[i].count + ' comments.');
		}
	}

}

For dates, it was a bit weird. I knew I could sort by date, so I used a cursor and fetched one object. I then opened a new cursor, reversed, and did the same. This feels wrong.


var first, last;
posts.index('created').openCursor(null).onsuccess = function(e) {
	var cursor = e.target.result;
	if(cursor) {
		var d = new Date(cursor.value.created);
		//console.log('first '+d);
//			cursor.continue();
		first = d;
		posts.index('created').openCursor(null,'prev').onsuccess = function(e) {
			var cursor = e.target.result;
			if(cursor) {
				var d = new Date(cursor.value.created);
				last = d;
				console.log('comments from '+first+' to '+last);
				//console.log('last '+d);
	//			cursor.continue();
			} else {
			}

		}

	} else {
	}

}

And here is my “per year” code (and again, I hard coded the values):


var years = [];
for(var i = 2003; i<=2016; i++) {
	years.push(i);
}

years.forEach(function(year) {
	//test 2016
	var yearBegin = new Date(year,1,1).getTime();
	var yearEnd = new Date(year,11,31,23,59,59).getTime();
	var range = IDBKeyRange.bound(yearBegin, yearEnd);
	posts.index('created').count(range).onsuccess = function(e) {
		console.log('Year '+year +' had '+e.target.result+ ' comments');
	};
});

This displays well, but I kinda worry that due to the async nature, it’s possible one year will come in after another, or vice versa, whatever, you get the idea. I’ll probably have to use an object to store results, sort the keys, and then work on the data.

Finally, here’s the top threads report.


/*
unique threads, but only w/ posts
*/
var threads = [];
posts.index('thread').openCursor(null,'nextunique').onsuccess = function(e) {
	var cursor = e.target.result;
	if(cursor) {
		//console.log('item', cursor.value.id, cursor.value.author.name);
		threads.push(cursor.value.thread);
		cursor.continue();
	} else {
		console.log(threads.length + ' total threads');
		doThreadStats(threads);
	}
}

How does this work? I get each unique thread index, get the actual thread object, and then store the thread data. I then sort.


function doThreadStats(threads) {
	console.log('doThreadStats');

	threads.sort(function(a,b) {
		if(a.posts > b.posts) return -1;
		if(a.posts < b.posts) return 1;
		return 0;
	});
	for(var i=0;i<10;i++) {
		console.log(threads[i].title + ' ('+threads[i].link+') with '+threads[i].posts + ' comments.');
	}

}

You can run the demo here, but remember, this is using my API key which I abused the hell out of. Assume it won’t work. And open your dev tools. Your dev tools are open, right?

https://cfjedimaster.github.io/disqus-analytics/deep1/

The full source code for this version may be found here: https://github.com/cfjedimaster/disqus-analytics/tree/master/deep1

Ok… so next is to package this baby up into something a bit prettier. Oh, and 3D animated charts with cats of course.

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate looking for his next gig. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support.

Lafayette, LA https://www.raymondcamden.com

Comments