Sign in to follow this  
Followers 0
mstanley

googlebot ip address ranges

6 posts in this topic

Someone was asking about scripting something like dyndns which got me searching google for "what is my ip." When the results showed up I noticed that a few of them had IP addresses in their summary and that they were in the same range of addresses. I got a little chuckle as I realized that it must be googlebot indexing sites like whatismyip.org and then caching the results. I copied a couple of the addresses and did a reverse dns lookup with the following results:

66.249.65.135 resolves to

"crawl-66-249-65-135.googlebot.com"

Top Level Domain: "googlebot.com"

Using nslookup will give a little more info than that. It was clear that I had some googlebot addresses but a few other things became apparent too. A couple of the listings showed the googlebot agent string which is:

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Also, 3 or 4 of the listings showed the ports that were being used but they were so random that I would have to port scan all the servers. Someone else can do that. ;-) One other listing had an extra entry with googlebot(at)googlebot.com so I tried www.googlebot.com but it just dumped me into google.com.

With all that info it looked like googlebots covered a large range of addresses but it was limited to three different subnets. I finally had enough to run nslookup on all three ranges and the following results are what I found using this command:

root@bear:~#for site in `seq 1 254`; do nslookup 66.249.66.$site|grep googlebot; done

Googlebot's Address Range 66.249.65.1 to 66.249.65.244

Googlebot's Address Range 66.249.66.1 to 66.249.66.244

Googlebot's Address Range 66.249.72.21 to 66.249.72.29

Googlebot's Address Range 66.249.72.210 to 66.249.72.244

This whole thing was kind of funny to me. If you go to google's help page they will tell you to use the agent string to identify googlebot because they use way too many ip address to setup a filter. Hm, doesn't seem like it would be too hard to filter what I've listed.

Enjoy :-)

Nullkraft

Side Note: In my travels I also discovered that http://www.whatismyip.org/ spits back only your ip address with no other characters or encoding. If you wget that address it will save a file called index.html with only your IP address in it.

0

Share this post


Link to post
Share on other sites

A lot of good info about google but I don't see the ranges for googlebot. The reason I focused on googlebot was another story that discussed using the googlebot agent string to gain access to sites that require registration such as the journal Nature. I refuse to register for those sites but if they are giving their content to googlebot without registration then I want to be googlebot.

I previously tried using what I thought was googlebot's agent string in my browser but after finding the one above I definitely had it wrong. Still, I suspect that the content providers may also be verifying the ip address before allowing their content to be indexed. I'll try the new agent string I found but if I have to spoof my ip I don't know how I can make it look like one of googlebot's.

0

Share this post


Link to post
Share on other sites

I found a story a while back about setting your UserAgent to mimic the GoogleBot, and it works on forums, they t hink you are googlebot, shows that "googlebot" is looking at the forum and such heh, it's kind of neat

0

Share this post


Link to post
Share on other sites

In Firefox, type about:config into the address bar. Then right click anywhere on the list of stuff, and make a new string. Name it general.useragent.override . Then for the value, use Googlebot/2.1 (+http://www.googlebot.com/bot.html) . This will override the default user agent of Firefox to that of the googlebot. Keep in mind, now that you are the googlebot, you have some sweet new powers. For instance, you can now walk up to any woman you want and ask her for her number. She'll check your user agent and see that you're the googlebot, and freely give you the information. Just don't let her crosscheck your ip address with the known ranges of the googlebot ip, or you're set up to burn.

0

Share this post


Link to post
Share on other sites

Actually I just went and downloaded a user agent extension for firefox and then tried it out on one of the whatismyip type sites and it showed the googlebot user agent. It seems that the journal Nature doesn't even allow googlebot into their premium content or something else is happening. I pulled robots.txt and there ain't a whole lot to it:

User-agent: *

Disallow: /laban/

Disallow: /*.pdf$

Disallow: /*.PDF$

Disallow: /*/*/*/pf/

And here is one of the premium content links I tried to access without any luck.

http://www.nature.com/news/2006/060501/full/441008a.html

It seems they are using something more than robots.txt to protect their site from googlebot.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0