![]() |
BLOCKING crawlers whose SE's send you zero traffic
I've been looking at one of my mainstream sites and there are several crawlers (such as cuil.com's) that like to munch on my pages, but the search engine referral traffic they send is virtually (or even absolutely) zero.
There's a lot of pages on this site so a fair percentage of the access is from crawler bots. (GoogleBot hits it 100,000+ times a day but that figure is way ahead of the others, and G actually sends back referrals...) I've been thinking about blocking the deadbeat crawlers via robots.txt, but then there's always the question looming - will the search engines they're attached to start sending traffic in the future? Am I going to shoot myself in the foot? Has anyone contemplated this scenario? |
if you are concerned about bandwidth costs from search engine bots you have bigger issues than the ones you may be contemplating.
|
Quote:
|
LOL. Even though the SE bots are actually the majority of the "traffic" loading the site this issue is nothing to do with bandwidth. It is more multiple database accesses and overall load of the server.
|
cloak the pages for the other bots, who knows might trick them into sending some nice traffic :1orglaugh
|
if you are concerned about server stress from search engine bots you have bigger issues than the ones you may be contemplating.
Quote:
|
Quote:
It's a profile site that pulls together various interlinking bits and pieces so it is reasonably database intensive. So I'm not really concerned about server load NOW, more in the future... and really, I just don't get why I should let (eg) cuil scrape my site when they return zero traffic... |
Why don't you just use some sort of cacheing to cut down on DB queries?
|
Quote:
FWIW I am planning to set up multiple backend servers so that I can simply add more when the load gets too high... I'm really just curious whether anyone's said "fuck you" to a (bona fide but obscure) search engine bot that does nothing but scrape. :pimp |
just redirect them all to a flat landing page or doorway page.
then if anyone does actually come from their sites, they can choose to press enter .. |
All times are GMT -7. The time now is 06:06 PM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123