Proxy Search Engine Hell!
If you run a proxy site you probably know all about high bandwidth usage and big server loads! Those starting out probably don’t know that not only do you have to worry about the bandwidth your proxy uses you should also be concerned with the server load caused by a large number of people browsing through your proxy site(s).
Recently I found one of my proxy sites was causing an abnormally high server load and decided to check into the problem turns out GOOGLE was bombing my site! Basically the google bot was going to so many pages proxified through my proxy it was almost overloading my server! Google was hitting 70,000+ pages per-day on my site and using 1-2gb of bandwidth PER-DAY! Unlike most websites this is not good for a proxy based website because like I previously mentioned google can literraly take down your server and you DO NOT benefit nearly as much from someone visiting a proxified page from google than you do if they go to your proxy’s homepage with your ads and money earners
on display.
So how do you fix this problem and prevent google from spydering your site? Simple!
You need a robots.txt file setup properly.
Here’s the one I use:
User-agent: Googlebot
Disallow: /*?
User-agent: Googlebot-Image
Disallow: /
This will block google from going to proxified pages but not block them from goign to your homepage it will also block google image bot from spydering anything on your site. I only block google because none of the other search engines have ever given me any trouble you could of course change the file to prevent ALL search engines. (Note: google is the only on I know of that allows the usage o * in robots.txt)
