Crawl Budget: Optimise Google and AI Crawl in 2026

TL;DR — Crawl budget is the number of pages that Googlebot (and AI bots) is willing to crawl on your site within a given timeframe. For small sites (fewer than 100 pages), it is rarely a problem. But from 100+ pages onwards, every useless URL in the index steals resources from your strategic pages. This guide explains how to identify waste, prioritise important pages, and configure your infrastructure to maximise crawl efficiency.

[Image: crawl budget diagram - allocation vs waste]

Crawl budget: where do Googlebot visits go on your site?

Understanding crawl budget in 2026

Isometric illustration of crawl budget optimization — Crawl budget : comment l'optimiser efficacement

Google defines crawl budget as the combination of two factors: the crawl rate limit (how many requests per second Googlebot sends without overloading your server) and crawl demand (how important Google judges your pages to be worth crawling).

In 2026, a third factor is added: the AI crawl budget. GPTBot, ClaudeBot and PerplexityBot have their own budgets, generally more limited than Googlebot. They crawl fewer pages, less often, and with shorter timeouts.

Gary Illyes, analyst at Google (Zurich), clarified at Search Central Live 2025 in Paris: "If your site responds slowly, we will crawl less. If your pages all look alike, we will lose interest. Crawl budget is not a metric you can configure — it is a consequence of the quality of your site."

Diagnosing crawl waste

Before optimising, identify where your crawl budget is being wasted. The usual suspects:

Waste source	Impact	Solution	Priority
Parameter pages (?sort=, ?filter=)	Very high	Canonical tag + robots.txt	P1
Infinite pagination pages	High	Noindex or limited pagination	P1
Duplicate content (www vs non-www, http vs https)	High	301 redirects + canonical	P1
Mass 404 pages	Medium	Redirect or delete	P2
Redirect chains (A → B → C)	Medium	Direct redirect A → C	P2
Low-value pages indexed	Medium	Noindex or consolidation	P2

Diagnostic tools: analyse your server logs (Screaming Frog Log Analyzer, Oncrawl) to see exactly which pages Googlebot visits. Cross-reference with Search Console (coverage report) to identify pages crawled but not indexed.

5 crawl budget optimisation strategies

Clean up robots.txt — block useless sections (parameters, facets, admin pages) while allowing strategic AI bots (see our robots.txt and AI guide)
Strategic XML sitemap — only include pages you want indexed, with reliable lastmod dates (see our sitemap guide)
Improve TTFB — a fast server lets Googlebot crawl more pages in the same time (see Core Web Vitals guide)
Consolidate thin content — merge low-traffic pages that cover similar topics
Directed internal linking — point internal links towards your priority pages to signal their importance to bots (see internal linking guide)

Crawl budget and AI bots: the specifics

AI bots have distinct crawling behaviour compared to Googlebot:

Lower volume — GPTBot crawls 10 to 100x fewer pages than Googlebot on the same site
Aggressive timeouts — frequent abandonment after 1.5-2 seconds of TTFB
No JS rendering — only static HTML is read
robots.txt sensitivity — GPTBot and ClaudeBot respect directives (contrary to some misconceptions)
llms.txt file — guides AI bots to your most important pages (see our llms.txt guide)

Bartosz Goralewicz, CEO of Onely (Poland): "For AI bots, crawl budget is even more precious. They visit far fewer pages, so every crawled page must count. The llms.txt file and a well-configured robots.txt are your best allies."

[Image: comparison of Googlebot vs GPTBot crawl in server logs]

Log analysis: Googlebot vs GPTBot crawl volume on a 500-page site

Monitoring your crawl budget

Crawl budget optimisation is not a one-shot task. Here are the metrics to track:

Pages crawled per day — Search Console > Settings > Crawl stats
Average response time — same report, target < 500ms
Crawled/indexed page ratio — if the ratio is low, Googlebot is crawling but not judging your pages worthy of indexation
AI bot crawl — analyse your server logs to track GPTBot, ClaudeBot, PerplexityBot

For the complete technical context, see our technical SEO guide 2026.

FAQ — Crawl Budget

From how many pages should you worry about crawl budget?

Google states that crawl budget is only a concern for "large sites" (10,000+ pages). In practice, we see benefits from optimising from 100 pages, especially for AI bots that crawl far less.

Can you increase your crawl budget?

Not directly. But by improving your TTFB, removing unnecessary URLs, and regularly publishing quality content, Google will naturally allocate more crawl to your site.

Does noindex consume crawl budget?

Yes. Noindex prevents indexing but not crawling. To prevent crawling, use robots.txt. Ideally, combine both for truly unnecessary pages.

Do AI bots respect crawl-delay in robots.txt?

It depends on the bot. GPTBot and ClaudeBot generally respect robots.txt directives (Allow/Disallow) but not necessarily crawl-delay. The best control remains your server response speed.

How do you prioritise pages for crawling?

Through three levers: the XML sitemap (only include strategic pages), internal linking (more internal links = more crawled), and robots.txt (block what should not be crawled).

Are your important pages being crawled?

We analyse your server logs and optimise your crawl budget so that Google and AI visit what truly matters.

Analyse my crawl budget

Crawl Budget: How to Optimise It for 100+ Pages