I enjoy building API demos so I generally keep an eye out for interesting APIs to play with. A few weeks ago it occurred to me that I had not seen anyone talking about or sharing information about Spam APIs. I may be showing my age a bit, but it feels like spam was a much larger issue back in the early days. It was something you always heard about and worried about but not so much anymore. Much like nuclear war.

Duck and cover

I did a bit of digging and it turns out Chris Coyler had similar thoughts 4 years ago: "Spam Detection APIs". I thought I'd check out a few myself and share the results. Here, in no particular order, are the APIs I tried.

Test Data #

Before I looked into any APIs, I gathered a bit of test data. I found five examples of 'good' text and five of 'bad'. I copied from emails in my inbox, spam, and my wife's as well. On the 'bad' side I avoided the over the top sexual ones for obvious reasons, but I did copy one that was a bit risque. I put these test strings in an external JS file:

export let tests = {
  good:[],
  bad:[]
}

After the initialization, I then just did a bunch of copying and pasting. Below is one example from the good side, and one from the bad.

tests.good.push(`Hello All!

Thank you all for joining us today, we hope you enjoyed the workshop! For anyone that wasn't able to make it of would like to refer back this link will take you to a recording of the event:
https://drive.google.com/file/d/1h1IH7ns-3ywxi00Y6cDl_TpYG7v6Y2gF/view?usp=sharing

Best Regards,
OTU GDSC`);

tests.bad.push(`💪 If you're looking for a lady to be in a relationship with, I could be your lady . I'm Maya, and I'm ready to match with a local dude who understands how to romantic with a girl like me. ❣ 👉Connect with me here👈 .`);

Here's the entire data set if your curious:

OOPSpam #

The first API I tried was from OOPSpam. They get credit for having a free trial without needing a credit card, but had an incredibly (imo) small trial of 40 API calls. Obviously, a free trial should have limits but I was really surprised to see how small it was. In my testing I ensured I saved the results as at most, I'd be able to do 3 complete tests of the ten items. (I say 3 as I knew I'd be doing one or two individual tests before testing the entire data set.) If you do pay for their service, their pricing page shows a good set of ranges, and to their credit, if the lowest level is too much, they ask you to reach out for something customized to your needs. I've got some experience with APIs that have bad "cliffs" (free goes up to X, the first pay tier is some number WAY over X, and folks in the middle are kinda screwed) so that was good to see.

I went to their docs and was happy with how easy it looked to be. Their API for spam checking only requires the content, but can additionally use the IP address and email of the person who created the content. It's a simple API, but oddly they use xhr in their JavaScript demos which hasn't been a recommended way of doing network calls in quite some time.

That being said, it wasn't hard to rewrite it in fetch:

async function checkSpam(s) {
	let body = {
		content: s
	}

	let resp = await fetch("https://api.oopspam.com/v1/spamdetection", {
		method:'POST',
		headers: {
			'X-Api-Key':KEY,
			'content-type':'application/json'
		},
		body:JSON.stringify(body)
	});
	
	return await resp.json();

}

I built up a simple script that loaded in my test data, ran each test, and saved the result to the file system. All of my API tests used this format so I'll only share this once:

import { tests } from './inputdata.js';
import fs from 'fs';

const KEY = 'my key is more secret than your key...';

console.log(`There are ${tests.good.length} good tests and ${tests.bad.length} bad tests.`);

let totalResults = [];

async function checkSpam(s) {
	let body = {
		content: s
	}

	let resp = await fetch("https://api.oopspam.com/v1/spamdetection", {
		method:'POST',
		headers: {
			'X-Api-Key':KEY,
			'content-type':'application/json'
		},
		body:JSON.stringify(body)
	});
	
	return await resp.json();

}

for(let good of tests.good) {

	let result = await checkSpam(good);
	totalResults.push({
		type:'good', 
		input:good,
		results:result
	});
}

for(let bad of tests.bad) {

	let result = await checkSpam(bad);
	totalResults.push({
		type:'bad', 
		input:bad,
		results:result
	});
}

fs.writeFileSync('./oopsspam_results.json', JSON.stringify(totalResults, null, '\t'), 'utf8');
console.log('Done with tests.');

Good results look like so:

{
"Score": 2,
"Details": {
	"isContentSpam": "nospam",
	"numberOfSpamWords": 0,
	"spamWords": []
}

And here's a bad result:

{
"Score": 3,
"Details": {
	"isContentSpam": "spam",
	"numberOfSpamWords": 7,
	"spamWords": [
		"buy",
		"now",
		"buy",
		"collect",
		"for you",
		"buy",
		"unsubscribe"
	]
}

Seems very clear. Their endpoint docs show additional options that let you block disposable emails and even entire languages and countries.

So how did it do?

Of the five 'good' samples, two were flagged as spam. Of the five 'bad' samples, three were correctly flagged. I wouldn't call that great, but maybe with tweaking using the optional arguments it could be better. Of course, with the tiny free trial it may be hard to test and see if it's going to work well for you though. (Since they ask folks to reach out who want something below the lowest priced tier, it may be worth reaching out for additional free trial credits too.)

APILayer Spam Check API #

Next up is the Spam Check API from APILayer. They've got a real generous free tier (three thousand calls a month) and then pretty cheap plans above that. Their API is rather simple - you pass the body and can optionally specify a threshold value that determines how strict the checking is.

Here is the wrapper function I wrote for them. I'm not specifying a threshold so it defaults to 5, in the middle of the 1 to 10 range with 1 basically considering everything spam.

async function checkSpam(s) {

	let resp = await fetch('https://api.apilayer.com/spamchecker', {
		method:'post',
		headers:{
			'apikey':KEY
		},
		body:s
	});

	return await resp.json();
}

At the default threshold of 5, every single item was marked as not spam. At 2.3 (I picked that as their sample used it), everything was spam. Threshold 4 also marked everything as not spam. Ditto for 3.5.

I then realized the result from the API was returning the score for the input, which is good as you can see how 'close' it is to your threshold, but every single input had the exact same result. I know my inputs aren't terribly long, but that just seems wrong. I'd probably avoid this one. Here's a sample result:

{
	"result": "The received message is considered spam with a score of 2.5",
	"score": 2.5,
	"is_spam": true,
	"text": "\nHarbor Freight\nCongratulations !\n\n\nBRAND NEW MILWAUKEE DRILL SET\n\nthis email is our official letter for your Confirmation.\n\n\nCongratulations! You've been chosen to Get an exclusive reward! Your Name came up for a BRAND NEW MILWAUKEE DRILL SET\n\nFrom Harbor Freight !\nCONTINUE FOR FREE»\n "
}

I'll also note that this API was the slowest, by a wide margin, of the three I tested. It took about 4 seconds to run each test.

Akismet #

Last but not least is the Akismet API. While focused on Wordpress, it can be used for general purposes as well. Pricing seems fair and while there isn't a real "free trial" or "tier", you can select the Personal "name your price" tier and select 0. It will prompt you to confirm you will only use it on a non-commercial site though.

Their API can be a bit complex. You need to specify the blog you are using and I initially thought they were requiring a Wordpress blog, but I used my own and it worked fine. It also requires the IP address of the person creating the content. For my tests I just used my IP and I think that may have unfairly hurt the results, so keep that in mind when looking at the stats below. You can specify a wide range of optional arguments as well as the content type. While the docs talk about blog comments and that's probably the main use, it's absolutely not the only use.

Here's my implementation and note that the IP is hard coded which is not something you would use in production:

async function checkSpam(s) {

	let params = new URLSearchParams();
	params.append('api_key', KEY);
	params.append('blog', 'https://www.raymondcamden.com');
	params.append('user_ip', '76.72.11.67');
	params.append('comment_type', 'comment');
	params.append('comment_content', s);

	let resp = await fetch('https://rest.akismet.com/1.1/comment-check', {
		method:'post',
		headers:{
		},
		body:params
	});

	return await resp.json();
}

How did it do? Of the five good inputs, all were marked correctly. Of the five bad inputs, only one was marked correctly.

Now, that sounds bad, but I honestly feel like additional arguments provided by the API would have greatly helped. Out of all the APIs, this was the quickest. Honestly, my gut tells me this is probably the best of the options I tested and I'd start here.

Honorable Mention - Postmark #

I did quickly test Postmark which is free and also easy to use, but is not meant for 'general purpose' spam checking as it expects the full text of an email, including headers and such. If you are only looking to test email, this may be a great option.

Photo by Hannes Johnson on Unsplash

"Duck and cover" photo by James Vaughan