BlogSEO TechniqueCrawl Budget: How to Optimise It for 100+ Pages
Back to blog
SEO Technique

Crawl Budget: How to Optimise It for 100+ Pages

Crawl budget determines how many pages Google and AI bots visit on your site. If you have more than 100 pages, optimising it becomes crucial for your indexation and visibility.

LB
Lucie Bernaerts
Expert GEO
5 March 2026
11 min read
0 views
Crawl Budget: How to Optimise It for 100+ Pages
TL;DR — Crawl budget is the number of pages that Googlebot (and AI bots) is willing to crawl on your site within a given timeframe. For small sites (fewer than 100 pages), it is rarely a problem. But from 100+ pages onwards, every useless URL in the index steals resources from your strategic pages. This guide explains how to identify waste, prioritise important pages, and configure your infrastructure to maximise crawl efficiency.
[Image: crawl budget diagram - allocation vs waste]
Crawl budget: where do Googlebot visits go on your site?

Understanding crawl budget in 2026

Isometric illustration of crawl budget optimization
Crawl budget : comment l'optimiser efficacement

Google defines crawl budget as the combination of two factors: the crawl rate limit (how many requests per second Googlebot sends without overloading your server) and crawl demand (how important Google judges your pages to be worth crawling).

In 2026, a third factor is added: the AI crawl budget. GPTBot, ClaudeBot and PerplexityBot have their own budgets, generally more limited than Googlebot. They crawl fewer pages, less often, and with shorter timeouts.

Gary Illyes, analyst at Google (Zurich), clarified at Search Central Live 2025 in Paris: "If your site responds slowly, we will crawl less. If your pages all look alike, we will lose interest. Crawl budget is not a metric you can configure — it is a consequence of the quality of your site."

Diagnosing crawl waste

Before optimising, identify where your crawl budget is being wasted. The usual suspects:

Waste source Impact Solution Priority
Parameter pages (?sort=, ?filter=) Very high Canonical tag + robots.txt P1
Infinite pagination pages High Noindex or limited pagination P1
Duplicate content (www vs non-www, http vs https) High 301 redirects + canonical P1
Mass 404 pages Medium Redirect or delete P2
Redirect chains (A → B → C) Medium Direct redirect A → C P2
Low-value pages indexed Medium Noindex or consolidation P2

Diagnostic tools: analyse your server logs (Screaming Frog Log Analyzer, Oncrawl) to see exactly which pages Googlebot visits. Cross-reference with Search Console (coverage report) to identify pages crawled but not indexed.

5 crawl budget optimisation strategies

  1. Clean up robots.txt — block useless sections (parameters, facets, admin pages) while allowing strategic AI bots (see our robots.txt and AI guide)
  2. Strategic XML sitemap — only include pages you want indexed, with reliable lastmod dates (see our sitemap guide)
  3. Improve TTFB — a fast server lets Googlebot crawl more pages in the same time (see Core Web Vitals guide)
  4. Consolidate thin content — merge low-traffic pages that cover similar topics
  5. Directed internal linking — point internal links towards your priority pages to signal their importance to bots (see internal linking guide)

Crawl budget and AI bots: the specifics

AI bots have distinct crawling behaviour compared to Googlebot:

  • Lower volume — GPTBot crawls 10 to 100x fewer pages than Googlebot on the same site
  • Aggressive timeouts — frequent abandonment after 1.5-2 seconds of TTFB
  • No JS rendering — only static HTML is read
  • robots.txt sensitivity — GPTBot and ClaudeBot respect directives (contrary to some misconceptions)
  • llms.txt file — guides AI bots to your most important pages (see our llms.txt guide)

Bartosz Goralewicz, CEO of Onely (Poland): "For AI bots, crawl budget is even more precious. They visit far fewer pages, so every crawled page must count. The llms.txt file and a well-configured robots.txt are your best allies."

[Image: comparison of Googlebot vs GPTBot crawl in server logs]
Log analysis: Googlebot vs GPTBot crawl volume on a 500-page site

Monitoring your crawl budget

Crawl budget optimisation is not a one-shot task. Here are the metrics to track:

  • Pages crawled per day — Search Console > Settings > Crawl stats
  • Average response time — same report, target < 500ms
  • Crawled/indexed page ratio — if the ratio is low, Googlebot is crawling but not judging your pages worthy of indexation
  • AI bot crawl — analyse your server logs to track GPTBot, ClaudeBot, PerplexityBot

For the complete technical context, see our technical SEO guide 2026.

FAQ — Crawl Budget

From how many pages should you worry about crawl budget?

Google states that crawl budget is only a concern for "large sites" (10,000+ pages). In practice, we see benefits from optimising from 100 pages, especially for AI bots that crawl far less.

Can you increase your crawl budget?

Not directly. But by improving your TTFB, removing unnecessary URLs, and regularly publishing quality content, Google will naturally allocate more crawl to your site.

Does noindex consume crawl budget?

Yes. Noindex prevents indexing but not crawling. To prevent crawling, use robots.txt. Ideally, combine both for truly unnecessary pages.

Do AI bots respect crawl-delay in robots.txt?

It depends on the bot. GPTBot and ClaudeBot generally respect robots.txt directives (Allow/Disallow) but not necessarily crawl-delay. The best control remains your server response speed.

How do you prioritise pages for crawling?

Through three levers: the XML sitemap (only include strategic pages), internal linking (more internal links = more crawled), and robots.txt (block what should not be crawled).

Are your important pages being crawled?

We analyse your server logs and optimise your crawl budget so that Google and AI visit what truly matters.

Analyse my crawl budget
Share:
LB
Lucie Bernaerts
Expert GEO

Co-fondatrice et CEO d'AISOS. Expert GEO, elle accompagne les entreprises dans leur strategie de visibilite Google + IA.