Crawl Errors, Everything That You Need to Know

Posted September 15, 2016 by Garenne Bigby in Search Engine Optimization

crawl errors

When you encounter crawl errors, your website can be hindered from appearing in a search results page, even when your target audience has performed the ideal search query. The crawl error report comes about and gives details of the website's URLs that Google is not able to successfully crawl, or those that returned with an HTTP error code. There are two main sections to the error report, site errors and URL errors.

Server Errors

Server errors occur when Google is not able to access the URL, the site was busy, or the request has timed out. Because of this, Google had to discard the request. Google cannot access the site either because it is being blocked or because the server is lagging too much to respond. You can remedy the situation by fixing the server's connectivity issues.

  • Cut down on the loading time for dynamic page requests. Websites that deliver the same content with different URLs are considered to deliver the content dynamically. Sometimes these take too long to load, and the result is timeout issues. To remedy this, it is suggested to keep the parameters short and don't use them often.
  • You should be making sure that the website's hosting server is not misconfigured, overloaded, or simply down. If it is, consider increasing your own website's ability to take on increased traffic.
  • Check that you are not accidentally blocking Google. Yes, this does happen. To fix this, find the part of the infrastructure is doing the blocking, and simply remove it. If this is a problem in the firewall and you cannot fix it, talk to your hosting provider.
  • Wisely control the search engine crawling and indexing. Some website owners will block Google on purpose, and have total control over how the website is crawled and indexed.

Server connectivity issues include a timeout, truncated headers, connection resets, truncated responses, refused connections, failed connections, connection timeouts, and no responses.

  • A timeout happens when the server times out while waiting for the request. The server may be misconfigured or overloaded.
  • Truncated headers occur when Google successfully connected to the server, but the connection was closed before the entire header was sent. You will need to check back later.
  • A connection reset occurs when the server has processed Google's request, but cannot return any content, as the connection to the server was reset.
  • A truncated response will happen when the server closed the connection prior to Google being able to receive a full response, thus the body of the response is truncated.
  • A connection refused error happens when Google is not able to access the site because of a refusal to connect. The hosting provider may be blocking Googlebot, or the configuration may be mixed up.
  • When the connection is not reachable or is down, Google will give the Connect failed error.
  • As Google is not able to connect to the server, a Connect timeout error will be given.
  • No response error is shown when Google is able to connect to the site server, but the connection was ended before the server even sent any data.

To resolve many of these problems, you will need to ensure that the server is connected to the internet, and that it is not overloaded or configured incorrectly.

You many also see success when you utilize Fetch as Google to check that Googlebot is able to crawl the site. If it returns the content of the homepage with no problems, it is safe to assume that Google does in fact have all of the right access to the website and can process it appropriately.

With any problems with the connection, wait a little while and see if you can connect.

Site Errors

In a website that operates with no flaws, the “Site errors” portion of the report for the Crawl Errors will not show any errors—and this holds true for the majority of the websites that Google crawls. If Google has detected any considerable number of errors on the site, you will be notified of them in the form of a message to your account, no matter the size of the website. Then first looking at the Crawl Errors page, the Site Errors portion will show a status code adjacent to all of the 3 error types (Server connectivity, DNS, and robots.txt fetch). The normal indication for each of these will be a green check mark. If this is not the case, you may click on the box and view a detailed graphic of the details for crawls that took place in the last 90 days.

High Error Rates

If your website is reporting a 100% error rate for any of the 3 categories, it is highly likely that your website is not working for some reason.

There are a number of possibilities for this:

  • All directories need to be present—ensure that they have not been accidentally deleted or moved.
  • Appraise all new scripts to make sure that they are not malfunctioning over and over again.
  • If the website has been reorganized, review all external links to make sure that they are working properly.
  • If you have reorganized the site, make sure that the permissions for all sections for the site have not been changed.

If none of these reasons are why the website is experiencing crawl errors, the error rate could just be a passing influx, or could be attributed to an external cause, like someone linking to a nonexistent page. If this is the case, then there is no real problem with your website. Whatever the reason may be, when Google sees that there is an unusually large number of errors for a website, the webmaster will be notified so that they can look into the problem and fix it.

DNS Errors

DNS errors occur when a DNS server is down or there is a problem routing it to the domain, Googlebot is unable to communicate with it. Many times these types of errors will not impact Googlebot's ability to access the site, it can be a symptom of high stagnancy, which ultimately will impact your website's users.

To fix DNS errors, the first thing you can do is to have Google crawl the website. Instruct Fetch as Google to run on an important page (like the home page). If it comes back and does not report any problems, it is safe to assume that Google is able to properly access all of your site.

Next, if you are having recurring DNS errors, get in contact with your DNS provider. Many times, the DNS provider and web host are the same entity.

Furthermore, you may need to set your server up to respond to host names that do not exist that have an HTTP error code like 404 or 500. This is most applicable when the website has content generated by users and gives each user their own domain. In some cases, this can cause content to accidentally be duplicated across hostnames, and then in turn mess with Googlebot's crawling.

If you encounter either a DNS Lookup or a DNS Timeout, Google was not able to recognize the hostname. You can utilize Fetch as Google to make sure that the site may be properly crawled. If the site is returned with no issues, then Google is accessing your site properly. You may need to check with your registrar to ensure that the site is properly set up and the server is in fact connected to the internet.

Low Error Rates

When a website has an error rate that is less than 100% in any of the given categories, it may indicate a passing condition, but it may also mean that the site is being unnoticed or configured in a way that is incorrect. These issues should be looked at deeper, or you may perform your own search query to find out answers. Don't be surprised if Google alerts you if the general error rate is on the lower end, it is normal for a site with good configuration to have zero errors present in these categories.

Robot Failures

This happens when there is a problem finding a website's robots.txt file. Prior to Googlebot crawling a site, Googlebot will look at this file to see which pages that it will not be crawling. If the files exists but cannot be reached, the crawl is postponed so that Google does not crawl any unintended URLs. When this is the case, Googlebot will return to the site and crawl it when the robots.txt file can be accessed.

It is not always necessary to have a robots.txt file, surprisingly. It is only needed when a website contains URLs that the site owner does not want Googlebot to index. If your goal is to have search engines to crawl everything on your site, you don't even need an empty robots.txt file.

Overview of URL Errors

The section of the error report that is dedicated to URL errors is divided up into categories that will show the top 1,000 URL errors that are limited to that category. Not every error that shows up in this area will require your attention, but you should be vigilant in making sure that you monitor the errors closely, as some of them may have a negative effect on your website's users and Google's crawlers. Google will have already taken the liberty of placing the most important URLs at the top of the report. The importance is based on things like the number of errors as well as the pages that reference the URL.

More specifically, look at the following:

  • Sitemaps may need to be updated. Get rid of any old URLs that are no longer used from the sitemap, and when you add new sitemaps that will replace the old ones, it is vital to delete the old sitemap rather than just putting a redirect in place.
  • File Not Found errors for important URLs having 301 Redirects. These types of errors are very common, but if they occur on very important pages you will need to address them. Particularly if they are pages that are linked by older websites, misspelled URLs for important pages, old URLs within a sitemap that has been deleted, URLs of favorite pages that are no longer in existence, and the like. When you take care of errors like this, you are ensuring that all of your most important information (and hopefully your information as a whole) can be accessed not only by Google, but all internet users as well.
  • You should aim to keep your redirects simple and short. If there are numerous URLs that redirect more than a few times, it may be difficult for Googlebot to follow these and then interpret them. Try to keep the redirects to a low number.
  • Utilize a sitemap generator like DYNO Mapper to optimize your sitemap. 

Viewing the URL Error Details

There are a few different ways to view URL errors:

  • When viewing the table, use the filter to find a specific URL.
  • Select “Download” to receive a list of the top 1,000 errors belonging to that type of crawler. This would be something like smartphone or desktop.
  • You can see the error details if you follow the link from Application URIs or individual URLs.

Mobile or desktop URLs error details will show their status information on the error along with a list of web pages that reference the URL, and even a link to Fetch as Google so that you are able to troubleshoot the problems with that URL.

Marking the URL Errors as Fixed

One you have figured out where the problem is and what is causing the crawl errors, you are able to hide it from the list. Do this one at a time or many at a time. You will simply select the box next to the URL and then click Mark as Fixed, and the URL will vanish from the list.

Garenne Bigby
Author: Garenne BigbyWebsite: http://garennebigby.com
Founder @dynomapper
Garenne Bigby is freelance Chicago developer and founder of DYNO Mapper with over 10 years experience in both agency and freelance roles in design, development, user experience, SEO, and information architecture.

Back

Create Visual Sitemaps

Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.

Sign up for our free 14-day trial.
*No credit card required.

Popular Tags

Search Engine Optimization SEO Create Sitemaps Sitemaps Content Audit Sitemap Generator Website Content Audit UX Content Inventory Accessibility Testing

Related Articles

Private Beta

Are you interested in participating in Dyno Mapper's private beta period? We are currently selecting users so please fill out the form below to apply.

First Name*
Invalid Input

Last Name*
Invalid Input

Email*
Invalid Input

Occupation*
Invalid Input

How do you plan to use Dyno Mapper?*
Invalid Input

Submit