Jump to content




Google lists Googlebot file limits for crawling

Featured Replies

spider-web-ss-1920-800x450.jpg

Google has updated two of its help documents to explain the limits of Googlebot when it crawls. Specifically, how much Googlebot can consume by filetype and format.

The limits. The limits, some of which were documented already and are not new, include:

  • 15MB for web pages: Google wrote, “By default, Google’s crawlers and fetchers only crawl the first 15MB of a file.”
  • 64MB for PDF files: Google wrote, “When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file.”
  • 2MB for supported files types: Google wrote, “When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file.”

Note, these limits are pretty large and the vast majority of websites do not need to be concerned with these limits.

Full text. Here is what Google posted fully in its help documents:

  • “By default, Google’s crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.”
  • “When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file. From a rendering perspective, each resource referenced in the HTML (such as CSS and JavaScript) is fetched separately, and each resource fetch is bound by the same file size limit that applies to other files (except PDF files). Once the cutoff limit is reached, Googlebot stops the fetch and only sends the already downloaded part of the file for indexing consideration. The file size limit is applied on the uncompressed data. Other Google crawlers, for example Googlebot Video and Googlebot Image, may have different limits.”

Why we care. It is important to know of these limits but again, most sites will likely never even come close to these limits. That being said these are the document limits of Googlebot’s crawling.

View the full article





Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.