DYNO Mapper

Home / Blog / Search Engine Optimization / Crawl Errors, Everything That You Need to Know

Crawl Errors, Everything That You Need to Know

Crawl Errors, Everything That You Need to Know

Crawl errors keep pages out of Google’s index. If Google can’t reach your page, it can’t rank your page. That relationship hasn’t changed since the first web crawler fired up in 1993. What has changed, repeatedly, is the tooling Google provides for spotting and fixing those errors. If you’re working from a 2017-era guide, half the interface names and feature references in it no longer exist.

This guide covers what crawl errors are in 2026, how they’re categorized in the modern Google Search Console, how to diagnose and fix each major type, and which older tools (Fetch as Google, the old Crawl Errors report, Mark as Fixed) have been replaced.


The Modern Search Console Report

Google retired the old Crawl Errors report when the new Search Console rolled out in 2018. What replaced it is the Page Indexing report (formerly called Coverage, renamed in 2022), which is the primary place to find crawl and indexing issues today. It’s at: search.google.com/search-console → Indexing → Pages.

The old “Site Errors” and “URL Errors” split is gone. In the new Page Indexing report, every URL on your site falls into one of two states:

  • Indexed — Google found the page, crawled it successfully, decided it was worth indexing, and added it to search results.
  • Not indexed — Something prevented indexing. The report groups these URLs by reason, and each reason is a specific kind of crawl or indexing error.

For specific URLs, the URL Inspection tool (which replaced Fetch as Google in 2018) is the go-to diagnostic. Enter any URL from your verified property and it shows Google’s crawl status, whether the page is indexed, which canonical Google chose, and any blocking issues.

Crawl Error Categories in 2026

The modern Page Indexing report groups issues into categories that roughly correspond to what the old Crawl Errors report called “site errors” and “URL errors,” but with clearer labels and more granular distinctions.

Server Errors (5xx)

Google tried to crawl the page, but the server returned a 5xx status code — usually 500 (internal error), 502 (bad gateway), 503 (service unavailable), or 504 (gateway timeout). Common causes:

  • The server is overloaded or down.
  • A background job or cron is hogging CPU during peak traffic.
  • A firewall or DDoS protection service is blocking Googlebot by mistake.
  • The response is taking longer than Googlebot’s patience (usually 30 seconds) and timing out.
  • Dynamic pages with many URL parameters are generating too many variations for the server to handle.

Start by checking server health and uptime monitoring. Then verify that Googlebot isn’t being blocked by your CDN or firewall — use the URL Inspection tool to confirm Google can actually reach the page. If your host is inconsistent, budget for a better one; a flaky server is a fast path to ranking decline.

Redirect Errors

The URL redirects, but something about the redirect is broken. Common issues:

  • Redirect chain is too long (Google generally follows up to 5 redirects in a chain).
  • Redirect loop (A → B → A).
  • Redirect target returns 4xx or 5xx.
  • Redirect target URL is too long.
  • Empty URL in the redirect chain.

Fix redirect chains by pointing each old URL directly at its final destination, not at another redirect. Audit with Screaming Frog or Sitebulb, which both surface redirect chains clearly.

URL Blocked by robots.txt

Google found a link to the URL but your robots.txt file is telling crawlers not to fetch it. Sometimes that’s intentional (admin pages, staging environments). Sometimes it’s a mistake — for example, a blanket Disallow: / left over from a development environment.

Check your robots.txt file at yourdomain.com/robots.txt. For large-scale issues, use the robots.txt report in Search Console to see which URLs Google is currently blocking. Remember: robots.txt stops crawling, but it doesn’t remove already-indexed pages. To fully remove a page from the index, use a noindex meta tag instead.

URL Marked Noindex

The page has a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex HTTP header. Google crawled the page and honored the instruction not to index it. Often this is intentional (thank-you pages, internal search results). If the page should be indexed, remove the noindex directive and request re-indexing via URL Inspection.

Soft 404

The server returned a 200 OK status, but Google’s evaluation of the page content made it look like a “not found” page — for example, a page that says “no results” or “product unavailable” but returns 200. Soft 404s waste crawl budget and confuse Google about what’s actually on your site.

Fix by returning an actual 404 or 410 HTTP status code on genuinely missing pages, and make sure pages that should be live have real content. Empty category pages, stock-out product pages that aren’t redirected, and archived content stubs are the most common offenders.

Not Found (404)

Google tried to crawl the URL and the server returned 404 (not found). 404s are a normal part of the web — they aren’t a crawl “error” that hurts your site in aggregate, unless important URLs are returning 404 that shouldn’t. If a 404 URL has backlinks or internal links pointing at it, set up a 301 redirect to the nearest relevant replacement page. If nothing logical exists, leave the 404 — a 404 is a better signal than a redirect to an unrelated page.

Access Forbidden (401/403)

Google is being told it doesn’t have permission to access the URL. Common causes: authentication-required pages leaked into the sitemap, geographic or IP-based access controls that block Googlebot’s IP range, or bot-protection services blocking Googlebot.

Fix by removing restricted URLs from your sitemap, configuring your bot-protection service to allow verified Googlebot IPs, or adding authentication bypass for Google’s verified crawler IPs. Google publishes its verified crawler IPs at developers.google.com/search/apis/ipranges/googlebot.json.

Crawled, Currently Not Indexed

Google crawled the page and made a deliberate decision not to index it. This is qualitatively different from a technical error — it means Google saw your content and judged it insufficient for the index. Causes:

  • Thin content (very few words, boilerplate-only pages).
  • Low-quality content (content Google’s Helpful Content system demoted).
  • Content Google considers near-duplicate of higher-quality pages on other sites.
  • Pages that are technically fine but don’t serve a clear user intent.

The fix is almost always content quality. Expand thin pages, consolidate duplicates, improve what a searcher actually learns from the page. Re-request indexing once the content is substantively better.

Discovered, Currently Not Indexed

Google found the URL (usually via a sitemap or internal link) but hasn’t crawled it yet. This is common on new sites with limited crawl budget, on large sites where Googlebot hasn’t worked through every URL, or on sites where server response times are too slow for Google to justify crawling everything. If URLs stay in this state for weeks, it often signals crawl-budget pressure or slow server response.

Duplicate Without User-Selected Canonical

Google found several URLs with near-identical content and you haven’t declared a canonical. Google will pick one of the duplicates as the canonical itself. Fix by adding a rel="canonical" tag pointing to your preferred version. See our guide on canonical tags for the full workflow.

Duplicate, Google Chose Different Canonical Than User

You declared a canonical, but Google ignored it and picked a different URL. This usually means conflicting signals: your canonical tag points at URL A, but your internal links, sitemap, or 301 redirects point at URL B. Google went with the broader signal. Audit internal linking and sitemap entries to align them with your declared canonical.

Page With Redirect

The URL redirects somewhere else. This isn’t an error — it’s a status report. As long as the redirect is intentional and the target is indexable, no action is needed. Flag only when redirects chain unnecessarily or point at unrelated pages.

DNS and Robots.txt Fetch Errors

Two categories of failure happen before Google can even crawl a URL:

DNS errors mean Google couldn’t resolve your domain name to a server. Usually a temporary DNS hiccup; occasionally a domain expiration or DNS configuration problem. Test with dig or nslookup at the command line. If errors persist, call your DNS provider.

Robots.txt fetch errors mean Google couldn’t read your robots.txt file at all. Google will delay crawling your site until it can read the file, because it doesn’t want to crawl pages you might have explicitly disallowed. If robots.txt returns a 5xx error, Google treats this as a temporary situation and backs off. If it returns 4xx, Google assumes no restrictions and proceeds. If the file is unreachable for hours or days, crawling grinds to a halt.

Make sure robots.txt is served with a 200 status, is well-formed, and loads quickly. A missing robots.txt is fine (Google treats that as “no restrictions”); a broken or timing-out one is a real problem.

Diagnosing Errors with URL Inspection

The URL Inspection tool (in Search Console, replaced Fetch as Google in 2018) is the single most useful diagnostic for a specific URL. Enter any URL from your verified property and it shows:

  • Whether the URL is indexed
  • The last crawl date and result
  • The Google-selected canonical vs your declared canonical
  • Any crawl or indexing issues it found
  • What Googlebot actually rendered (for JavaScript-rendered pages)
  • Resources Googlebot couldn’t load

You can also click “Test Live URL” to have Google re-crawl the page right now and show the current result. When you’ve fixed an issue, click “Request Indexing” to push Google to re-crawl on a prioritized schedule. There’s a daily quota on Request Indexing submissions, so use it on pages that actually matter.

Fixing Issues at Scale with Validate Fix

The old “Mark as Fixed” button from the 2017-era Crawl Errors report is gone. The modern workflow is Validate Fix, available inside each issue category in the Page Indexing report. Click an error type (e.g., “Soft 404”), review the list of affected URLs, fix the underlying problem, then click “Validate Fix.” Google will re-crawl affected URLs over the following days or weeks and automatically clear the issue once verified.

Validate Fix handles many URLs in a single pass. You don’t mark individual URLs — you mark the whole error category as fixed and let Google re-verify.

Crawl Budget and Why It Matters

Every site has a crawl budget — how much of Googlebot’s time Google is willing to spend on your site before moving on. Crawl budget is rarely a problem for small sites (under a few thousand URLs). For larger sites, budget matters.

Things that waste crawl budget:

  • Faceted navigation generating endless parameter combinations
  • Infinite calendar pages or archive URLs
  • Excessive redirect chains
  • Slow server response (Google crawls fewer URLs when your server is slow)
  • Soft 404s and thin content
  • Duplicate URLs Google has to process just to discard

Fix these and Google crawls more of the URLs you actually want indexed. For more context, see our companion piece on crawlability vs indexability.

Frequently Asked Questions

Are 404 errors bad for SEO?

A 404 is a normal HTTP response. 404s on URLs that shouldn’t exist (typos, old pages that were genuinely retired) are fine. 404s on URLs that should exist, or on URLs with backlinks pointing at them, are problems — fix those with 301 redirects to relevant replacements. Google’s John Mueller has confirmed that 404s don’t harm a site’s overall rankings, only the specific URL returning 404.

How do I fix “Discovered — currently not indexed”?

This status usually means Google hasn’t crawled the URL yet, often because of crawl budget or slow server response. Improve server response time, submit the URL via URL Inspection’s “Request Indexing,” and make sure important pages have strong internal links pointing to them. For large sites, audit for wasted crawl budget (faceted URL explosions, thin duplicates).

What’s the difference between noindex and robots.txt?

Robots.txt stops Google from crawling a URL at all, but doesn’t remove it from the index if it’s already there (Google can still index URLs it can’t crawl, based on external signals). Noindex requires Google to crawl the page to see the tag, but once seen it reliably removes the page from the index. For removing pages, always use noindex; for controlling crawl workload, use robots.txt.

How often does Google re-check crawl errors?

For URLs you submit via “Request Indexing” in URL Inspection, expect re-crawl within hours to days. For sitewide issues in Validate Fix, expect re-verification over days to weeks depending on site size and crawl budget. Very large sites see longer cycles.

Does fixing crawl errors improve rankings?

Indirectly, yes. Fixing crawl errors doesn’t add a ranking boost, but it removes friction that’s preventing pages from ranking at all. A page that isn’t indexed can’t rank. A page that takes Google 30 seconds to crawl may not get re-crawled often enough to stay fresh.

What replaced the old Crawl Errors report?

The Page Indexing report (in Search Console, under Indexing → Pages) replaced the old Crawl Errors report when the new Search Console rolled out in 2018. The URL Inspection tool replaced Fetch as Google for per-URL debugging. The Validate Fix workflow replaced Mark as Fixed. Anything referencing those older features is out of date.

Bottom Line

Crawl errors are the technical layer that sits between your content and search visibility. Google has gotten dramatically better at telling you exactly what’s wrong with a URL via the Page Indexing report and URL Inspection tool — both of which replaced the clunkier 2015-era tools most older SEO articles describe. If you can verify your site in Search Console, read the Page Indexing report once a week, and spend 10 minutes triaging whatever’s broken, you’ll avoid almost every crawl-error problem before it affects rankings.

For related reading, our guides on canonical tags and crawlability vs indexability cover the neighboring technical-SEO topics that determine how well your pages get crawled and indexed in the first place.

Leave a Comment

Your email address will not be published. Required fields are marked *