I had an issue where Googlebot was spidering parts of my site that were not allowed in the robots.txt file…
My old robots.txt file…
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/
User-agent: Googlebot
Disallow: /ebay_
Hmmmm… that’s weird… Googlebot is still spidering stuff it shouldn’t be…
www.digitalpoint.com 66.249.66.138 - - [14/Mar/2006:06:21:07 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//www.google.com/bot.html)"
www.digitalpoint.com 66.249.66.138 - - [14/Mar/2006:10:26:18 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//www.google.com/bot.html)"
www.digitalpoint.com 66.249.66.138 - - [14/Mar/2006:14:29:35 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//www.google.com/bot.html)"
www.digitalpoint.com 66.249.66.138 - - [14/Mar/2006:17:47:21 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//www.google.com/bot.html)"
So I made an inquiry to Google about this, and I actually heard back (nice!)…
we did examine your robots.txt file. Please be advised that it appears
your Googlebot entry in your robots.txt file is overriding your generic
User-Agent listing. We suggest you alter your robots.txt file by
duplicating the forbidden paths under your Googlebot entry:
User-agent: *
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/
User-agent: Googlebot
Disallow: /ebay_
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/
Once you’ve altered your robots.txt file, Google will find it
automatically after we next crawl your site.
Okay… I can live with that… easy fix. But check this out… Google’s own robots.txt testing tool within Google Sitemaps show the old robots.txt as being able to block Googlebot as expected.
data:image/s3,"s3://crabby-images/556e4/556e4b95cbd3658fdad1a065bac1c738e23b5124" alt=""
So how about some consistency here? And more importantly, if anyone at Google is reading this, how about someone tell me why my blog is banned in your index… π