Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 10-20-2009, 06:25 PM   #1
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
BLOCKING crawlers whose SE's send you zero traffic

I've been looking at one of my mainstream sites and there are several crawlers (such as cuil.com's) that like to munch on my pages, but the search engine referral traffic they send is virtually (or even absolutely) zero.

There's a lot of pages on this site so a fair percentage of the access is from crawler bots. (GoogleBot hits it 100,000+ times a day but that figure is way ahead of the others, and G actually sends back referrals...)

I've been thinking about blocking the deadbeat crawlers via robots.txt, but then there's always the question looming - will the search engines they're attached to start sending traffic in the future? Am I going to shoot myself in the foot?

Has anyone contemplated this scenario?
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 06:29 PM   #2
Agent 488
Registered User
 
Industry Role:
Join Date: Feb 2006
Posts: 22,511
if you are concerned about bandwidth costs from search engine bots you have bigger issues than the ones you may be contemplating.
Agent 488 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 06:39 PM   #3
beemk
CLICK HERE
 
Industry Role:
Join Date: Jan 2002
Posts: 20,829
Quote:
Originally Posted by Agent 488 View Post
if you are concerned about bandwidth costs from search engine bots you have bigger issues than the ones you may be contemplating.
what he said
__________________
I host with Vacares
beemk is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 06:57 PM   #4
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
LOL. Even though the SE bots are actually the majority of the "traffic" loading the site this issue is nothing to do with bandwidth. It is more multiple database accesses and overall load of the server.
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:00 PM   #5
woj
<&(©¿©)&>
 
woj's Avatar
 
Industry Role:
Join Date: Jul 2002
Location: Chicago
Posts: 47,882
cloak the pages for the other bots, who knows might trick them into sending some nice traffic
__________________
Custom Software Development, email: woj#at#wojfun#.#com to discuss details or skype: wojl2000 or gchat: wojfun or telegram: wojl2000
Affiliate program tools: Hosted Galleries Manager Banner Manager Video Manager
Wordpress Affiliate Plugin Pic/Movie of the Day Fansign Generator Zip Manager
woj is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:02 PM   #6
Agent 488
Registered User
 
Industry Role:
Join Date: Feb 2006
Posts: 22,511
if you are concerned about server stress from search engine bots you have bigger issues than the ones you may be contemplating.

Quote:
Originally Posted by rowan View Post
LOL. Even though the SE bots are actually the majority of the "traffic" loading the site this issue is nothing to do with bandwidth. It is more multiple database accesses and overall load of the server.
Agent 488 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:12 PM   #7
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
Quote:
Originally Posted by Agent 488 View Post
if you are concerned about server stress from search engine bots you have bigger issues than the ones you may be contemplating.
Once again, there are more bots than humans loading this site. Out of ~200k loads per day roughly 60% are identifying themselves as bots, and there's probably 20-25% more that are rogue bots or site scrapers.

It's a profile site that pulls together various interlinking bits and pieces so it is reasonably database intensive.

So I'm not really concerned about server load NOW, more in the future... and really, I just don't get why I should let (eg) cuil scrape my site when they return zero traffic...
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:17 PM   #8
uno
RIP Dodger. BEST.CAT.EVER
 
uno's Avatar
 
Industry Role:
Join Date: Dec 2002
Location: NYC Area
Posts: 18,450
Why don't you just use some sort of cacheing to cut down on DB queries?
__________________
-uno
icq: 111-914
CrazyBabe.com - porn art
MojoHost - For all your hosting needs, present and future. Tell them I sent ya!
uno is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:25 PM   #9
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
Quote:
Originally Posted by uno View Post
Why don't you just use some sort of cacheing to cut down on DB queries?
With hundreds of millions of profiles it's pretty much all long tail, not much opportunity to cache anything. A page that a bot accesses may not be accessed by someone else for hours or even days. DB queries don't fare much better, with some basic profiling I've worked out that caching results would only result in a 5-10% reduction in raw queries, and at that level the negative overhead of caching could become comparable to the benefit it provides...

FWIW I am planning to set up multiple backend servers so that I can simply add more when the load gets too high... I'm really just curious whether anyone's said "fuck you" to a (bona fide but obscure) search engine bot that does nothing but scrape.
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-20-2009, 07:46 PM   #10
Thurbs
The Thrilla in Manila
 
Join Date: Sep 2004
Location: Thurbs' Lagoon, Christmas Island
Posts: 4,785
just redirect them all to a flat landing page or doorway page.

then if anyone does actually come from their sites, they can choose to press enter ..
Thurbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.