![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
|
New Webmasters ask "How-To" questions here. This is where other fucking Webmasters help. |
|
Thread Tools |
![]() |
#1 |
Confirmed User
Industry Role:
Join Date: Aug 2011
Posts: 131
|
Limiting Google Bot?
Friends,
Is there a way to effectively limit the amount and frequency of the bot other than screwing around with robots.txt and google webmaster tools? |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#2 | |
see you later, I'm gone
Industry Role:
Join Date: Oct 2002
Posts: 14,055
|
Quote:
.
__________________
All cookies cleared! |
|
![]() |
![]() ![]() ![]() ![]() |
![]() |
#3 |
Confirmed User
Industry Role:
Join Date: Jan 2004
Posts: 574
|
You could work with 2 robots.txt, one which allows, another which doesn't, and set up a cron job which executes a script to exchange them from time to time. Ask your host if you have managed hosting, should be easy to set up. If you have a dedicated server you could block the IP when you want to for a while.
__________________
![]() █ ► XenLayer - Paravirtualization Professionals since 2008 - [ICQ: 297820698] █ ► Reseller Hosting | OpenVZ VPS | XEN VPS | Dedicated Servers |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#4 |
Confirmed User
Industry Role:
Join Date: Nov 2012
Location: New Orleans
Posts: 213
|
You could definitely do what AndrewX says above, but can I ask why you want to avoid the two most logical ways to limit crawl frequency? Do you not have the necessary privileges to access robots.txt or Webmaster Tools?
__________________
Tent Pitcher - Adult Search Engine |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#5 |
Registered User
Industry Role:
Join Date: Aug 2007
Location: USA
Posts: 61
|
Sure. You can configure your server to only accept X number of requests per IP which will effectively rate limits Googlebot if they'd been coming at you with 10+ clients at a time.
Or if you feel like getting more complex, you can have your app server (PHP, Python, Rails) handle the rate limiting and send a 504 header (indicating a temporary overload) to Googlebot when its making too many requests. The downside of each is that it might hurt your SEO if you do it badly, and Google sees your site as 'slow' rather than rate limited. |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#6 |
Confirmed User
Industry Role:
Join Date: Aug 2011
Posts: 131
|
Is this called Bandwidth throttling? I can't find a link for info on setting this up. Any more info would be very appreciated.
I have a dedicated server. I use an access list to block the entire google subnet. I never got much traffic from google. I wouldn't mind a little more but I don’t want to be extorted into using their tools. I don't have a problem with letting google in but man they have literally send hundreds of bots hammering me all at one time and my performance for possible paying customers ends up sucking. |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#7 |
Registered User
Industry Role:
Join Date: Aug 2007
Location: USA
Posts: 61
|
What its called is 'Rate limiting by IP'.
If you're using Nginx, its provided by the HttpLimmitReqModule. More info here: http://serverfault.com/questions/179...-prevent-abuse Basically it lets you make a setting that says no more than X request per second per IP. If you're using Apache, the module you'd use is called "mod_evasive". If you're using Lighttpd, the module you'd use is called ModEvasive. ------ Using any of those modules may help with your problem, as well as stop people from spidering and ripping your sites quickly. The issue with Google is that if they use many IP addresses, they may get around your rate-limit. |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#8 |
Confirmed User
Industry Role:
Join Date: Aug 2011
Posts: 131
|
Friend. Thanks! I've got it setup. I'll let you know my progress.
|
![]() |
![]() ![]() ![]() ![]() |
![]() |
#9 |
Confirmed User
Industry Role:
Join Date: Aug 2011
Posts: 131
|
I tried 3 techniques
1. mod_evasive 2. mid_limitipconn 3. Iptables Limits Connections Per IP None were effective. I never found robot.txt to be effective. I ended up signing up for google webmaster tools just to limit the crawl rate. I feel extorted. I will play around with another solution soon but thanks for all the info. I learned a lot. |
![]() |
![]() ![]() ![]() ![]() |