July 15, 2016 (This post is more than 2 years old.)

Fighting against a content stealer on Blogger

misc

About two weeks ago a reader sent me a question concerning one of my blogs. While not unusual, the URL was. Apparently, a site on Blogger is automatically copying my content from my site (along with content from Andy Trice and Christophe Coenraets). Currently they have 402 copies (I'll explain how I know that in a minute) and of course, they will have a copy of this post too - in about 30 minutes.

The site in question is mr-cordova.blogspot.co.za. I'm not using a 'real' link for that of course as I don't want to give them anymore Google power than they already have. (Since at least one person went to his site thinking it was mine.) At the bottom of each post you can see an attribution: "by via Raymond Camden" but no direct link is provided. Even if they did, I certainly don't approve of them copying my content completely on their site.

When I first discovered this, I assumed it would be pretty simple to correct. I've been publishing web sites for over twenty years and have had problems with this since nearly day one. (I used to run a pretty popular site, deathclock.com, that was copied all the time.)

At the top of every site running on Blogger is a link that lets you report issues:

This leads to a "Choose Your Own Adventure" type interface for trying to report a problem. I ended up in an infinite loop at first but finally ended up on their DCMA removal tool. Their form lets you explain what content was stolen and then asks for the offending URLs.

Here is where the shit hit the fan. (Pardon the language.)

I explained, very clearly, that the site was stealing content from my blog (and two others). I submitted the request and I assumed it would be fixed rather quickly. Three days later I got a response:

Hello,
Thanks for reaching out to us.

With regard to the following URLs:

mr-cordova.blogspot.co.za

In order for us to investigate the appropriate content and take further action, please provide us with the specific URLs of the posts where the infringing content is located. You can obtain the post URL by clicking on the title of the post or the timestamp found at the bottom of the allegedly infringing post(s).

Regards, The Google Team

I responded immediately with an explanation about what the site was doing, and also explaining that even if I got every URL, they would just steal new content.

I got no response.

So I submitted again, one specific URL this time, but with explanatory text about how the site was stealing my content. The good news is that they removed the URL. The one damn URL. And completely ignored everything I said about the rest of the content.

So today I decided - what the hell - let me scrape the site. The site has a sitemap.xml which looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mr-cordova.blogspot.com/sitemap.xml?page=1</loc>
</sitemap>
<sitemap>
<loc>http://mr-cordova.blogspot.com/sitemap.xml?page=2</loc>
</sitemap>
<sitemap>
<loc>http://mr-cordova.blogspot.com/sitemap.xml?page=3</loc>
</sitemap>
</sitemapindex>
Which basically leads to 3 "pages" of a site map. Each page is a ginormous XML file of URLs. They look like this:
<url>
<loc>http://mr-cordova.blogspot.com/2015/04/coldfusion-updates-released-today.html</loc>
<lastmod>2015-04-14T22:22:03Z</lastmod>
</url>
I knew I could use XPath to parse that data, so I Googled for a random online XPath tool and found this one: Template / XPath 3.0 / XQuery 3.0 / CSS 3 Selector / JSONiq Online Tester

I used an XPath of //url/loc to parse each page of the sitemap:

I did this three times and ended up with 402 URLs. I then filed a new DCMA request, which again won't be enough to stop this asshat, and ran into a new problem. Blogger only allows me to file 100 URLs per day. So great. I've got 4 days now of filing requests. And I get to repeat this every month or so assuming Blogger never shuts this guy down.

Anyway - wish me luck. I'll update this post with a comment if I have any luck.

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Raymond Camden posted on 7/15/2016 at 1:49 PM

As a random aside - the site in question is also using AdSense, which means s/he is making money off my content. I wonder what legal recourse I'd have to get that money - even if it just pennies?

Comment 2 (In reply to #1) by Adam Tuttle posted on 7/15/2016 at 2:04 PM

Not just the blog owner, but Google, too. This is a practice known as Freebooting. Usually I've seen it discussed around video content, but it's the same for written stuff: Someone stole your content and is running ads on it. The longer it remains online, the more money the freebooter and the ad company (e.g. Facebook, Blogger (Google)) reap from it. It incentivizes dragging their feet.

http://www.itsokaytobesmart...

Comment 3 by Tom King posted on 7/15/2016 at 2:14 PM

At least they've also stolen this post, meaning the main post on there is now about them stealing.

Comment 4 (In reply to #2) by Raymond Camden posted on 7/15/2016 at 2:16 PM

I wouldn't even know who to contact. Do I contact AdSense, or Blogger? And with Blogger, there wasn't any way to reach a human directly, so how would I even begin.

I love all my Google services, but the number one issue with them is that as soon as one thing goes wrong, you end up against a giant wall of nothingness. I know I pay nothing for my Google services, but it's times like this when I wish there was a simple way to pay to get to a human. I'd gladly pay money to help protect my contact, even if I shouldn't have to pay. (Or perhaps they could use a system where you pay for support, and once they determine it is their issue, they refund the money.)

Comment 5 (In reply to #3) by Raymond Camden posted on 7/15/2016 at 2:17 PM

Yep - about 30 minutes.

Comment 6 (In reply to #2) by Raymond Camden posted on 7/15/2016 at 2:18 PM

A shot in the dark: https://twitter.com/raymond...

Comment 7 (In reply to #6) by Adam Tuttle posted on 7/15/2016 at 2:23 PM

The odds of getting that money is basically nil, but good luck!

Comment 8 (In reply to #4) by Adam Tuttle posted on 7/15/2016 at 2:23 PM

Technically it's up to Blogger to pull the content. I think you're already pursuing the right avenues. It's a shame that they're not doing a better job for indie creators such as... you know... yourself and every one of their users.

Comment 9 (In reply to #8) by Adam Tuttle posted on 7/15/2016 at 2:26 PM

The problem is that you're not much of a lawsuit threat. If you were DisneyCorp, they would be much more worried about getting sued out the wazoo and half of Blogger would be offline for a week.

See also the recent issues Surge.sh had from an NRA-submitted DMCA takedown request. (Which I know you're already familiar with, just not sure if you connected those dots.)

Comment 10 by askearly posted on 7/15/2016 at 3:54 PM

Gah! Hate that! People should write their own content instead of stealing! Talk about grrr. And boo on Blogger for being, I can't say because children might be reading. Dang man!

Comment 11 by Craig Inman posted on 7/15/2016 at 7:33 PM

I wonder if you could write it into a script. So that everytime you post, a request is sent to blogger to remove the content from Mr. Asshat's blog. It seems like the URLs are kind of standard.

Comment 12 (In reply to #11) by Raymond Camden posted on 7/15/2016 at 7:39 PM

I'm going to write a script to automate converting a sitemap into just a list of URLs. That's something.

Comment 13 by Raymond Camden posted on 7/16/2016 at 1:00 PM

No response as of yet. I just filed another DCMA notice for the next 100 URLs. I also - again - explained how the site is automatically stealing my content in case they actually decide to freaking READ.

Comment 14 by Raymond Camden posted on 7/17/2016 at 2:23 PM

Today I deployed the nuclear attack: I wrote a blog post and included an inline script that checks the domain and if it matches the evil site, it redirects back here. You can go to the latest blog post (about my book being on sale) and view source to see it. Heck, I'll include it here - I think Disqus auto escapes:

I have yet to hear back from Blogger.

Comment 15 by Raymond Camden posted on 7/17/2016 at 2:26 PM

Filing 100 more URLs today to blogger. Here is the note I included:

As I've said in my other notices, the site mr-cordova.blogspot.com is stealing my content in an automated manner. I'm including more URLs below (they have 400+ stolen articles) but can you please actually READ this text and understand that even if you remove the old URLs, s/he will just keep stealing my content? Can you PLEASE read this?

Comment 16 by Raymond Camden posted on 7/17/2016 at 6:25 PM

I just reported the site to Adsense. I also asked about getting the money they stole from me. I see little chance in that happening, but I have to ask.

Comment 17 by Raymond Camden posted on 7/18/2016 at 1:35 PM

Next batch of 100 URLs sent. No response from Blogger yet. (Or Adsense.)

Comment 18 by Darth Guybrush posted on 7/18/2016 at 10:17 PM

I usually get to your content via the ColdFusion Bloggers site. So long as Mr ArseHat isn't added to that site...

Comment 19 by Raymond Camden posted on 7/19/2016 at 2:05 PM

Update on July 19: Nothing to say. I feel dumb posting an update like this, but I want a record of how long Google/Blogger is just freaking ignoring me.

Comment 20 by Raymond Camden posted on 7/20/2016 at 12:54 PM

Update on July 20: Progress! 200 of the URLs removed. No word from Blogger yet about how the site is *automating* stealing my content, but it's something.

Comment 21 by Raymond Camden posted on 7/21/2016 at 12:37 PM

Update: All 400 URLs are nuked. He has more than that (I think maybe 6 or so more) but I figure I'll file again at the end of the month. No response from Blogger about how the person is automatically stealing. No response from AdSense about his violations and the money s/he stole from me.

Comment 22 by mgw4jc posted on 7/22/2016 at 6:09 PM

Scary thing is that you happened to find this one site that is scraping your content. How many more are out there doing the same thing that you don't know about?

For your next project, create a script that automatically detects when a stolen post is live and submits it to Blogger for you.

Comment 23 (In reply to #14) by mithlond posted on 8/29/2016 at 8:45 PM

This made me smile. It reminded me of a similar "reply" to someone "borrowing" things:

Comment 24 (In reply to #23) by Raymond Camden posted on 8/29/2016 at 8:46 PM

Oh that is *awesome*.

Support this Content!

Archived Comments

Webmentions