googlebot ip address ranges
Posted 06 May 2006 - 07:34 PM
18.104.22.168 resolves to
Top Level Domain: "googlebot.com"
Using nslookup will give a little more info than that. It was clear that I had some googlebot addresses but a few other things became apparent too. A couple of the listings showed the googlebot agent string which is:
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Also, 3 or 4 of the listings showed the ports that were being used but they were so random that I would have to port scan all the servers. Someone else can do that. ;-) One other listing had an extra entry with googlebot(at)googlebot.com so I tried www.googlebot.com but it just dumped me into google.com.
With all that info it looked like googlebots covered a large range of addresses but it was limited to three different subnets. I finally had enough to run nslookup on all three ranges and the following results are what I found using this command:
root@bear:~#for site in `seq 1 254`; do nslookup 66.249.66.$site|grep googlebot; done
Googlebot's Address Range 22.214.171.124 to 126.96.36.199
Googlebot's Address Range 188.8.131.52 to 184.108.40.206
Googlebot's Address Range 220.127.116.11 to 18.104.22.168
Googlebot's Address Range 22.214.171.124 to 126.96.36.199
This whole thing was kind of funny to me. If you go to google's help page they will tell you to use the agent string to identify googlebot because they use way too many ip address to setup a filter. Hm, doesn't seem like it would be too hard to filter what I've listed.
Side Note: In my travels I also discovered that http://www.whatismyip.org/ spits back only your ip address with no other characters or encoding. If you wget that address it will save a file called index.html with only your IP address in it.
Posted 07 May 2006 - 09:31 AM
I previously tried using what I thought was googlebot's agent string in my browser but after finding the one above I definitely had it wrong. Still, I suspect that the content providers may also be verifying the ip address before allowing their content to be indexed. I'll try the new agent string I found but if I have to spoof my ip I don't know how I can make it look like one of googlebot's.
Posted 07 May 2006 - 01:29 PM
Posted 08 May 2006 - 03:02 PM
Posted 08 May 2006 - 09:36 PM
And here is one of the premium content links I tried to access without any luck.
It seems they are using something more than robots.txt to protect their site from googlebot.
BinRev is hosted by the great people at Lunarpages!