Jump to content


Photo
- - - - -

googlebot ip address ranges


  • Please log in to reply
5 replies to this topic

#1 nullkraft

nullkraft

    SUP3R 31337 P1MP

  • Binrev Financier
  • 284 posts
  • Location:Think Heisenberg uncertainty principle.

Posted 06 May 2006 - 07:34 PM

Someone was asking about scripting something like dyndns which got me searching google for "what is my ip." When the results showed up I noticed that a few of them had IP addresses in their summary and that they were in the same range of addresses. I got a little chuckle as I realized that it must be googlebot indexing sites like whatismyip.org and then caching the results. I copied a couple of the addresses and did a reverse dns lookup with the following results:

66.249.65.135 resolves to
"crawl-66-249-65-135.googlebot.com"
Top Level Domain: "googlebot.com"

Using nslookup will give a little more info than that. It was clear that I had some googlebot addresses but a few other things became apparent too. A couple of the listings showed the googlebot agent string which is:

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Also, 3 or 4 of the listings showed the ports that were being used but they were so random that I would have to port scan all the servers. Someone else can do that. ;-) One other listing had an extra entry with googlebot(at)googlebot.com so I tried www.googlebot.com but it just dumped me into google.com.

With all that info it looked like googlebots covered a large range of addresses but it was limited to three different subnets. I finally had enough to run nslookup on all three ranges and the following results are what I found using this command:

root@bear:~#for site in `seq 1 254`; do nslookup 66.249.66.$site|grep googlebot; done

Googlebot's Address Range 66.249.65.1 to 66.249.65.244
Googlebot's Address Range 66.249.66.1 to 66.249.66.244
Googlebot's Address Range 66.249.72.21 to 66.249.72.29
Googlebot's Address Range 66.249.72.210 to 66.249.72.244

This whole thing was kind of funny to me. If you go to google's help page they will tell you to use the agent string to identify googlebot because they use way too many ip address to setup a filter. Hm, doesn't seem like it would be too hard to filter what I've listed.

Enjoy :-)
Nullkraft

Side Note: In my travels I also discovered that http://www.whatismyip.org/ spits back only your ip address with no other characters or encoding. If you wget that address it will save a file called index.html with only your IP address in it.

#2 nick84

nick84

    Member

  • Agents of the Revolution
  • 1,680 posts
  • Gender:Male

Posted 07 May 2006 - 07:34 AM

Querying for "google" at whois.arin.net or using the web interface at http://ws.arin.net/cgi-bin/whois.pl will get you a list of Google IP blocks (including one for IPv6 http://ws.arin.net/c...ET6-2001-4860-1 )

#3 nullkraft

nullkraft

    SUP3R 31337 P1MP

  • Binrev Financier
  • 284 posts
  • Location:Think Heisenberg uncertainty principle.

Posted 07 May 2006 - 09:31 AM

A lot of good info about google but I don't see the ranges for googlebot. The reason I focused on googlebot was another story that discussed using the googlebot agent string to gain access to sites that require registration such as the journal Nature. I refuse to register for those sites but if they are giving their content to googlebot without registration then I want to be googlebot.

I previously tried using what I thought was googlebot's agent string in my browser but after finding the one above I definitely had it wrong. Still, I suspect that the content providers may also be verifying the ip address before allowing their content to be indexed. I'll try the new agent string I found but if I have to spoof my ip I don't know how I can make it look like one of googlebot's.

#4 jedibebop

jedibebop

    Dangerous free thinker

  • Members
  • 1,935 posts

Posted 07 May 2006 - 01:29 PM

I found a story a while back about setting your UserAgent to mimic the GoogleBot, and it works on forums, they t hink you are googlebot, shows that "googlebot" is looking at the forum and such heh, it's kind of neat

#5 JimboPDHS

JimboPDHS

    What number are we thinking of?

  • Members
  • 69 posts
  • Location:Los Angeles

Posted 08 May 2006 - 03:02 PM

In Firefox, type about:config into the address bar. Then right click anywhere on the list of stuff, and make a new string. Name it general.useragent.override . Then for the value, use Googlebot/2.1 (+http://www.googlebot.com/bot.html) . This will override the default user agent of Firefox to that of the googlebot. Keep in mind, now that you are the googlebot, you have some sweet new powers. For instance, you can now walk up to any woman you want and ask her for her number. She'll check your user agent and see that you're the googlebot, and freely give you the information. Just don't let her crosscheck your ip address with the known ranges of the googlebot ip, or you're set up to burn.

#6 nullkraft

nullkraft

    SUP3R 31337 P1MP

  • Binrev Financier
  • 284 posts
  • Location:Think Heisenberg uncertainty principle.

Posted 08 May 2006 - 09:36 PM

Actually I just went and downloaded a user agent extension for firefox and then tried it out on one of the whatismyip type sites and it showed the googlebot user agent. It seems that the journal Nature doesn't even allow googlebot into their premium content or something else is happening. I pulled robots.txt and there ain't a whole lot to it:

User-agent: *
Disallow: /laban/
Disallow: /*.pdf$
Disallow: /*.PDF$
Disallow: /*/*/*/pf/

And here is one of the premium content links I tried to access without any luck.
http://www.nature.co...ll/441008a.html

It seems they are using something more than robots.txt to protect their site from googlebot.




BinRev is hosted by the great people at Lunarpages!