GEO: what Google's crawl limit reveals about AI search visibility

On February 3, 2026, Google quietly reorganized its crawling documentation. What looked like a routine cleanup revealed something that had been hiding in plain sight: two distinct file size limits that most of the SEO industry had been conflating for years.

One number caught everyone’s attention: a 2MB limit for Search indexing. But the real story is not about file sizes. It is about what this documentation restructure tells us about where Google is heading and, more importantly, what it means for your visibility across an entirely new class of consumers: AI systems.

How Google’s crawl pipeline actually works

Google’s crawling infrastructure has always operated in stages. The documentation update made this explicit by separating two previously blurred concepts:

Stage	Limit	What it covers
Fetching (download)	15MB	All Google crawlers (Search, News, Shopping, Gemini)
Indexing (ranking)	2MB	Googlebot specifically, for HTML and text content
PDF indexing	64MB	Googlebot for PDF documents

Think of it as a two-step filter. Google first downloads your page (up to 15MB of uncompressed data), then processes the first 2MB of HTML content for Search ranking purposes. Anything beyond that 2MB boundary is effectively invisible to Google Search.

Server → [15MB fetch limit] → Downloaded HTML → [2MB index limit] → Indexed for Search
                                                                   → Content beyond 2MB ignored

A critical detail: these limits apply to uncompressed data. Your server may send a gzipped response of 200KB, but what matters is the size after decompression. CSS, JavaScript, and image files are each fetched separately with their own limits. The 2MB applies specifically to the HTML document itself.

Does the 2MB limit affect your site?

Almost certainly not.

According to the Web Almanac 2025, the median HTML page size across the web is roughly 22KB. That is 90 times smaller than the 2MB limit. Even an unusually large page at ten times the median would only be 220KB, barely a tenth of the threshold.

Google’s John Mueller was characteristically direct: it is “extremely rare that sites run into issues” with these limits.

This is also not a new restriction. Google explicitly labeled it a “documentation clarification,” not a behavioral change. The 2MB indexing limit has likely been in place for years. The only thing that changed is that Google finally wrote it down separately from the 15MB fetch limit.

Over 99% of websites will never need to think about this.

Sites at risk: SPAs, heavy CMS, and inline JavaScript

That said, there are edge cases worth knowing about, especially if you work with large-scale or technically complex sites:

Single-page applications that inline entire JavaScript bundles and CSS into the HTML document
Pages with base64-encoded images embedded directly in the markup instead of referenced as external files
Heavy CMS installations like WordPress themes with dozens of plugins injecting scripts, or Magento product pages with extensive inline code
Data-heavy product pages with massive embedded JSON-LD structured data blocks
Location pages that load hundreds or thousands of entries inline rather than paginating or loading them dynamically

If any of these sound familiar, it is worth measuring. But for the vast majority of sites built with modern practices, the 2MB limit is a non-issue.

GEO and AI crawlers: the visibility gap nobody is measuring

Here is where it gets interesting. The reason Google restructured its documentation is not just about Search. It is about organizing their crawler infrastructure for a multi-product ecosystem. That shift has implications far beyond file size limits.

Consider how different AI systems consume your content:

ChatGPT (GPTBot), Perplexity, and Claude only read the raw HTML response from your server. They do not render JavaScript. If your content requires client-side JS to appear in the DOM, it is completely invisible to these systems.

Google Gemini has a structural advantage here. Because it operates within Google’s ecosystem, it can access Googlebot’s pre-rendered content, the version of your page that has already been processed through Google’s rendering pipeline.

This creates a two-tier visibility problem. A JavaScript-heavy site might rank perfectly well in Google Search (because Googlebot renders JS) while being entirely invisible to ChatGPT, Perplexity, and Claude. Your content exists for Google but does not exist for a growing portion of how people discover information.

Server-side rendering (SSR) is no longer a nice-to-have. It is the only reliable way to ensure your content is visible across both traditional search and AI systems. If your site depends on client-side JavaScript to display critical content, you have a visibility gap that will only widen as AI-driven discovery grows.

5 ways to check your crawl limit and AI visibility

Five practical ways to assess where you stand:

Quick search test. Pick a distinctive phrase from the bottom of one of your pages and search for it in Google (in quotes). If Google returns the page, your content is being indexed fully.
Command line. Run curl -s https://yoursite.com/page | wc -c to measure the uncompressed HTML size in bytes. If the number is under 2,000,000, you are fine.
Browser DevTools. Open the Network tab, filter by “Doc,” and check the uncompressed size of the HTML document (not the transfer size).
Screaming Frog. Use the Size column to identify pages approaching the limit. The tool currently flags at 15MB; expect an update to flag at 2MB soon.
Google Search Console. Use URL Inspection to verify that specific pages are being indexed as expected.

For AI visibility specifically, compare your response HTML against your rendered HTML (what the server sends vs. what appears after JavaScript runs). If there is a significant difference, your AI visibility is compromised.

From SEO to GEO: what your business should do next

If you are running a standard business website, a blog, or even a moderately complex e-commerce site, the 2MB limit changes nothing for you. Do not let anyone tell you otherwise.

What should be on your radar:

Add HTML size monitoring to your technical SEO audits. Flag pages at 1.5MB as a warning and 2MB as critical.
Compare response HTML vs. rendered HTML as a new audit dimension. This reveals your AI visibility gap.
Prioritize server-side rendering if your site relies on JavaScript to display important content. This is not just about Google anymore, but about being found by ChatGPT, Perplexity, Claude, and whatever comes next.
Think beyond Search. Google reorganizing its crawling docs is a signal that crawl optimization is about visibility across their entire product ecosystem, not just the blue links.

GEO is technical SEO for the AI era

The foundations are the same: server-side rendering, clean HTML, structured data. What’s different is the audience. You’re not just optimizing for a ranking algorithm. You’re optimizing for systems that summarize your content, cite it, or skip it entirely.

The bar is higher than page one. Your content needs to be clear enough for an AI to extract, attribute, and trust.

The 2MB limit? A non-event. But Google splitting its crawler docs by product (Search, News, Gemini) is the real signal. One-system optimization is over.

At Thinking Too, we’ve built on these technical foundations for twenty years. If you’re not sure where your AI visibility stands, let me know.