Interesting TrackBack Spam(?)

This post is more than 2 years old.

So today I've gotten two trackback pings from one blog, mainbud.blogspot.com. (I'm not using real links for this guy.) What makes this spam interesting is that it seems like he is copying my content. Consider this url: http://mainbud.blogspot.com/2005/11/trackback-support.html

I've already deleted the TB for it. He added another TrackBack this morning that just linked to his home page.

Has anyone heard of people using TB to steal content? I would simply add his URL to the spam list, but I want to see if he 'borrows' more of my content. The interesting thing is that it seems as if there is no direct link to the entry above.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by JesterXL posted on 11/22/2005 at 9:10 PM

Yeah, they're called "splogs" and "link farms". Basically, they create a multitude of sites that screen-scrape legitimate content from RSS feeds, blogs, and other sites onto their own. They then inter-link amongst the hundreds of sites they have created. Trackback's is another way for them to get a high-ranking site to link to them. Thus, if you are highly ranked in Google, by linking to them, you greatly increase their site ranking; Trackback is the sneaky way of getting you to link to them.

Combined with cross-site linking, blog comment spam, etc., these all add to get them to the top of Google and other search engine results for legitimate search terms. They can utilize their high page ranking to sell their services, or sell their high page rank to increase the page rank of their customers.

...basically, shady Search Engine Optimization & Placement services.

Comment 2 by Steven Erat posted on 11/22/2005 at 9:30 PM

Now that I'm using BlogCFC 4 with Trackbacks, I've also graduated to my first Trackback SPAM.

On an entry about my Flickr photos for my wedding reception, a Trackback was made to a site not linking to me at all, but rather is just some kind of Wedding bookstore and book reviewer, at (modified URL with [SPAM]) http://weddings[SPAM].blog57[SPAM].com/posts/14626. After reviewing the content, I deleted the trackback.

On a related note, since having added a Contact Me page it seems I've opened up another Pandora's box. Some folks have been using it for 'legitimate' reasons, but others have submitted requests for me to blog content on their behalf, and others are sending Technical Support questions that aren't related to any blog entries. So, I've added a 'Use this form if...' / 'Dont use this if ...' type caveat.

Comment 3 by Raymond Camden posted on 11/22/2005 at 9:35 PM

Hey Steve, want to trade? You should see my Ask a Jedi queue. ;)

Comment 4 by Steven Erat posted on 11/22/2005 at 9:37 PM

To continue, for years now I've been getting Link Spam where my site is periodically flooded with HTTP GET requests where the HTTP REFERER references a casino, a pharmaceutical item, or one of many adult related themes. There's a Wikipedia entry about this type of spamming: http://en.wikipedia.org/wik...

Since I display my referers on my blog under a public viewable page, I filter all traffic through a custom tag that checks for a couple hundred patterns which I'm continually adding to. If a banned referer comes through, I send a 403 Not Authorized header, provide a link to the main blog site. Pete Freitag has more about this solution: http://www.petefreitag.com/...

Comment 5 by Steven Erat posted on 11/23/2005 at 1:02 AM

Hey! Did you jinx me? ;)

I got about 100 requests today from the IP network 85.255.113.0/24 where they were requesting trackback.cfm directly with a blank HTTP Referer and the following query string type:

[UUID]&excerpt=[long list of links to urlencoded sites]&url=[link to domain pcadsl.com.tw]&title=[censored]

The IP resolves to Belarus, and the domain is not on WHOIS.

Since the method was GET not POST, the trackback didn't get added, but if it were a POST then it might have worked with just one more parameter.

I'm going to add a requirement to trackback.cfm that the method must be POST and the HTTP Referer must be either blank "" or come from my domain. This will break the trackback spec since trackbacks can't be automated and would require a user add manually add a trackback, but might save me the headache.

Comment 6 by Christopher Wigginton posted on 11/23/2005 at 2:09 AM

Ray,

You might want to consider a captcha, which will eliminate a large portion of most spam attacks, Unfortunately, captcha's are not "accessible" friendly. I'm not sure Bayesian filtering would catch trackback's when the links are several levels deep in the splog hiearchy and the top linkages seem relatively ok. Another option might be to connect to an RBL (Real Time Spam Block List).

I've also thought about post limits per ip, so on a flood, you just trim after the set limit.

Comment 7 by tony of the weeg clan posted on 11/23/2005 at 2:19 AM

is this a feature easily disabled in blogcfc 4.0, i really do not want the headaches.

and secondly, what does a trackback really get me anyway?

tw

Comment 8 by JesterXL posted on 11/23/2005 at 2:20 AM

Captcha's stop bots dead in their tracks. They are THE solution, although, I can't comment on the accessibility ramifications.

Trackbacks are nice in that they allow you to immediately know when someone links to you and/or discusses your article. It helps add to the connectedness of blogs. Comments help increase further discussion, and trackbacks are just another facet of that.

Comment 9 by Raymond Camden posted on 11/23/2005 at 2:25 AM

Tony, TBs can be turned off in the ini file.

Comment 10 by tony of the weeg clan posted on 11/23/2005 at 2:33 AM

killer. might captcha be a 4.01 upgrade? ray ray.

Comment 11 by Raymond Camden posted on 11/23/2005 at 2:36 AM

I do not deny the power of captha - but I despise it. That being said - if there is a free way of doing it, I would consider it. If I were doing it for a 'pay site', I'd use Alagad (love all his stuff), but since BlogCFC is free, I can't use it.

Comment 12 by Christopher Wigginton posted on 11/23/2005 at 2:47 AM

Ray,

Check out

http://www.emerle.net/progr...

As far as I know it is free and he doesn't charge for the download. What you could do is put in the framework to support cfx_captcha (check out those security enhancements I sent you awhile back) and then enable it through the ini. Leave it up to the blog installer to put in the cfx_captcha tag and then enable it.

Comment 13 by Christopher Wigginton posted on 11/23/2005 at 3:03 AM

Check out the cfx_captcha implemented for comments on my blog, which is version 3.9 of your blogCFC (haven't had the time to go to 4 yet). I just turned it on via the ini.

http://www.intersuite.com/c...