About two weeks ago a reader sent me a question concerning one of my blogs. While not unusual, the URL was. Apparently, a site on Blogger is automatically copying my content from my site (along with content from Andy Trice and Christophe Coenraets). Currently they have 402 copies (I'll explain how I know that in a minute) and of course, they will have a copy of this post too - in about 30 minutes.
The site in question is mr-cordova.blogspot.co.za. I'm not using a 'real' link for that of course as I don't want to give them anymore Google power than they already have. (Since at least one person went to his site thinking it was mine.) At the bottom of each post you can see an attribution: "by via Raymond Camden" but no direct link is provided. Even if they did, I certainly don't approve of them copying my content completely on their site.
When I first discovered this, I assumed it would be pretty simple to correct. I've been publishing web sites for over twenty years and have had problems with this since nearly day one. (I used to run a pretty popular site, deathclock.com, that was copied all the time.)
At the top of every site running on Blogger is a link that lets you report issues:

This leads to a "Choose Your Own Adventure" type interface for trying to report a problem. I ended up in an infinite loop at first but finally ended up on their DCMA removal tool. Their form lets you explain what content was stolen and then asks for the offending URLs.
Here is where the shit hit the fan. (Pardon the language.)
I explained, very clearly, that the site was stealing content from my blog (and two others). I submitted the request and I assumed it would be fixed rather quickly. Three days later I got a response:
Hello,Thanks for reaching out to us.
With regard to the following URLs:
mr-cordova.blogspot.co.za
In order for us to investigate the appropriate content and take further action, please provide us with the specific URLs of the posts where the infringing content is located. You can obtain the post URL by clicking on the title of the post or the timestamp found at the bottom of the allegedly infringing post(s).
Regards, The Google Team
I responded immediately with an explanation about what the site was doing, and also explaining that even if I got every URL, they would just steal new content.
I got no response.
So I submitted again, one specific URL this time, but with explanatory text about how the site was stealing my content. The good news is that they removed the URL. The one damn URL. And completely ignored everything I said about the rest of the content.
So today I decided - what the hell - let me scrape the site. The site has a sitemap.xml which looks like this:
<?xml version='1.0' encoding='UTF-8'?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://mr-cordova.blogspot.com/sitemap.xml?page=1</loc> </sitemap> <sitemap> <loc>http://mr-cordova.blogspot.com/sitemap.xml?page=2</loc> </sitemap> <sitemap> <loc>http://mr-cordova.blogspot.com/sitemap.xml?page=3</loc> </sitemap> </sitemapindex>
Which basically leads to 3 "pages" of a site map. Each page is a ginormous XML file of URLs. They look like this:
<url> <loc>http://mr-cordova.blogspot.com/2015/04/coldfusion-updates-released-today.html</loc> <lastmod>2015-04-14T22:22:03Z</lastmod> </url>
I knew I could use XPath to parse that data, so I Googled for a random online XPath tool and found this one: Template / XPath 3.0 / XQuery 3.0 / CSS 3 Selector / JSONiq Online Tester
I used an XPath of
//url/loc
to parse each page of the sitemap:![]()
I did this three times and ended up with 402 URLs. I then filed a new DCMA request, which again won't be enough to stop this asshat, and ran into a new problem. Blogger only allows me to file 100 URLs per day. So great. I've got 4 days now of filing requests. And I get to repeat this every month or so assuming Blogger never shuts this guy down.
Anyway - wish me luck. I'll update this post with a comment if I have any luck.
Archived Comments
As a random aside - the site in question is also using AdSense, which means s/he is making money off my content. I wonder what legal recourse I'd have to get that money - even if it just pennies?
Not just the blog owner, but Google, too. This is a practice known as Freebooting. Usually I've seen it discussed around video content, but it's the same for written stuff: Someone stole your content and is running ads on it. The longer it remains online, the more money the freebooter and the ad company (e.g. Facebook, Blogger (Google)) reap from it. It incentivizes dragging their feet.
http://www.itsokaytobesmart...
At least they've also stolen this post, meaning the main post on there is now about them stealing.
I wouldn't even know who to contact. Do I contact AdSense, or Blogger? And with Blogger, there wasn't any way to reach a human directly, so how would I even begin.
I love all my Google services, but the number one issue with them is that as soon as one thing goes wrong, you end up against a giant wall of nothingness. I know I pay nothing for my Google services, but it's times like this when I wish there was a simple way to pay to get to a human. I'd gladly pay money to help protect my contact, even if I shouldn't have to pay. (Or perhaps they could use a system where you pay for support, and once they determine it is their issue, they refund the money.)
Yep - about 30 minutes.
A shot in the dark: https://twitter.com/raymond...
The odds of getting that money is basically nil, but good luck!
Technically it's up to Blogger to pull the content. I think you're already pursuing the right avenues. It's a shame that they're not doing a better job for indie creators such as... you know... yourself and every one of their users.
The problem is that you're not much of a lawsuit threat. If you were DisneyCorp, they would be much more worried about getting sued out the wazoo and half of Blogger would be offline for a week.
See also the recent issues Surge.sh had from an NRA-submitted DMCA takedown request. (Which I know you're already familiar with, just not sure if you connected those dots.)
Gah! Hate that! People should write their own content instead of stealing! Talk about grrr. And boo on Blogger for being, I can't say because children might be reading. Dang man!
I wonder if you could write it into a script. So that everytime you post, a request is sent to blogger to remove the content from Mr. Asshat's blog. It seems like the URLs are kind of standard.
I'm going to write a script to automate converting a sitemap into just a list of URLs. That's something.
No response as of yet. I just filed another DCMA notice for the next 100 URLs. I also - again - explained how the site is automatically stealing my content in case they actually decide to freaking READ.
Today I deployed the nuclear attack: I wrote a blog post and included an inline script that checks the domain and if it matches the evil site, it redirects back here. You can go to the latest blog post (about my book being on sale) and view source to see it. Heck, I'll include it here - I think Disqus auto escapes:
<script>
if(document.location.hostname.indexOf('mr-cordova.blogspot.com') >= 0) {
alert('This site is stealing my content - sending you to the proper blog now...');
document.location.href = 'http://www.raymondcamden.com';
}
</script>
I have yet to hear back from Blogger.
Filing 100 more URLs today to blogger. Here is the note I included:
As I've said in my other notices, the site mr-cordova.blogspot.com is stealing my content in an automated manner. I'm including more URLs below (they have 400+ stolen articles) but can you please actually READ this text and understand that even if you remove the old URLs, s/he will just keep stealing my content? Can you PLEASE read this?
I just reported the site to Adsense. I also asked about getting the money they stole from me. I see little chance in that happening, but I have to ask.
Next batch of 100 URLs sent. No response from Blogger yet. (Or Adsense.)
I usually get to your content via the ColdFusion Bloggers site. So long as Mr ArseHat isn't added to that site...
Update on July 19: Nothing to say. I feel dumb posting an update like this, but I want a record of how long Google/Blogger is just freaking ignoring me.
Update on July 20: Progress! 200 of the URLs removed. No word from Blogger yet about how the site is *automating* stealing my content, but it's something.
Update: All 400 URLs are nuked. He has more than that (I think maybe 6 or so more) but I figure I'll file again at the end of the month. No response from Blogger about how the person is automatically stealing. No response from AdSense about his violations and the money s/he stole from me.
Scary thing is that you happened to find this one site that is scraping your content. How many more are out there doing the same thing that you don't know about?
For your next project, create a script that automatically detects when a stolen post is live and submits it to Blogger for you.
This made me smile. It reminded me of a similar "reply" to someone "borrowing" things:
Oh that is *awesome*.