i've set up scripts (real basic) that check for patterns (regular expressions) in the last say 5000 lines of a log file. if they reach a certain threshold it blocks them for a period, say an hour. harsh, but it catches 'em. the thing about it though, it's requires a lot of planning to set it up right - you want to be sure not to block someone for loading your page if it has 50 images. for example on one server i have it set to block if you've hit 25 different CGIs (other than a traffic counting one and admin ones, gotta exclude some too) in 5 minutes or so. this was necessary because the suckers were coming in and killing a server with their simultaneous connections - denial of service attack, basically. but this might not be appropriate for some sites, in fact probably not for most sites.
i think the only way to do it really well is to have something sitting in front of apache proxying traffic that handles the throttling based on whatever it determines is a "session" ip address, cookie, URL prepend, whatever. doing it in apache is just doomed of course - you have a limited number of httpd children ('cause of memory) and if some spider comes along and eats up 100 connections (it has happened), and they're all "sleeping" for a period while it throttles them, that's 100 connections that can't be used for anyone else. lame.
