GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Webmaster Q & Fuckin' A (https://gfy.com/forumdisplay.php?f=27)
-   -   Limiting Google Bot? (https://gfy.com/showthread.php?t=1091951)

porkbrothersdotnet 12-06-2012 06:08 PM

Limiting Google Bot?
 
Friends,

Is there a way to effectively limit the amount and frequency of the bot other than screwing around with robots.txt and google webmaster tools?

sarettah 12-06-2012 06:30 PM

Quote:

Originally Posted by porkbrothersdotnet (Post 19356026)
Friends,

Is there a way to effectively limit the amount and frequency of the bot other than screwing around with robots.txt and google webmaster tools?

edited out because it wouldn't work after I thought about it for more than a second.


.

AndrewX 12-06-2012 08:24 PM

You could work with 2 robots.txt, one which allows, another which doesn't, and set up a cron job which executes a script to exchange them from time to time. Ask your host if you have managed hosting, should be easy to set up. If you have a dedicated server you could block the IP when you want to for a while.

Tent Pitcher 12-07-2012 11:49 PM

You could definitely do what AndrewX says above, but can I ask why you want to avoid the two most logical ways to limit crawl frequency? Do you not have the necessary privileges to access robots.txt or Webmaster Tools?

cruxop 12-08-2012 12:54 AM

Sure. You can configure your server to only accept X number of requests per IP which will effectively rate limits Googlebot if they'd been coming at you with 10+ clients at a time.

Or if you feel like getting more complex, you can have your app server (PHP, Python, Rails) handle the rate limiting and send a 504 header (indicating a temporary overload) to Googlebot when its making too many requests.

The downside of each is that it might hurt your SEO if you do it badly, and Google sees your site as 'slow' rather than rate limited.

porkbrothersdotnet 12-09-2012 07:27 PM

Is this called Bandwidth throttling? I can't find a link for info on setting this up. Any more info would be very appreciated.

I have a dedicated server. I use an access list to block the entire google subnet. I never got much traffic from google. I wouldn't mind a little more but I don’t want to be extorted into using their tools. I don't have a problem with letting google in but man they have literally send hundreds of bots hammering me all at one time and my performance for possible paying customers ends up sucking.

cruxop 12-09-2012 09:19 PM

What its called is 'Rate limiting by IP'.

If you're using Nginx, its provided by the HttpLimmitReqModule. More info here: http://serverfault.com/questions/179...-prevent-abuse

Basically it lets you make a setting that says no more than X request per second per IP.

If you're using Apache, the module you'd use is called "mod_evasive".

If you're using Lighttpd, the module you'd use is called ModEvasive.

------

Using any of those modules may help with your problem, as well as stop people from spidering and ripping your sites quickly. The issue with Google is that if they use many IP addresses, they may get around your rate-limit.

porkbrothersdotnet 12-10-2012 08:45 AM

Friend. Thanks! I've got it setup. I'll let you know my progress.

porkbrothersdotnet 12-17-2012 09:28 PM

I tried 3 techniques

1. mod_evasive

2. mid_limitipconn

3. Iptables Limits Connections Per IP

None were effective. I never found robot.txt to be effective. I ended up signing up for google webmaster tools just to limit the crawl rate. I feel extorted. I will play around with another solution soon but thanks for all the info. I learned a lot.


All times are GMT -7. The time now is 01:21 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123