Google is warning against using 404 and other 4xx client server status errors, such as 403s, for the purpose of trying to set a crawl rate limit for Googlebot. “Please don’t do that,” Gary Illyes from the Google Search Relations team wrote.
Why the notice. There has been a recent increase in the number of sites and CDNs using these techniques to try to limit Googlebot crawling. “Over the last few months we noticed an uptick in website owners and some content delivery networks (CDNs) attempting to use 404 and other 4xx client errors (but not 429) to attempt to reduce Googlebot’s crawl rate,” Gary Illyes wrote.
What to do instead. Google has a detailed help document just on the topic of reducing Googlebot crawling on your site. The recommended approach is to use the Google Search Console crawl rate settings to adjust your crawl rate.
Google explained, “To quickly reduce the crawl rate, you can change the Googlebot crawl rate in Search Console. Changes made to this setting are generally reflected within days. To use this setting, first verify your site ownership. Make sure that you avoid setting the crawl rate to a value that’s too low for your site’s needs. Learn more about what crawl budget means for Googlebot. If the Crawl Rate Settings is unavailable for your site, file a special request to reduce the crawl rate. You cannot request an increase in crawl rate.”
If you can’t do that, Google then says “reduce the crawl rate for short period of time (for example, a couple of hours, or 1-2 days), then return an informational error page with a 500, 503, or 429 HTTP response status code.”
Why we care. If you noticed crawling issues, maybe your hosting provider or CDN recently deployed these techniques. You may want to submit a support request with them to show them Google’s blog post on this topic to ensure they are not using 404s or 403s to reduce crawl rates.