How to Get Google to Crawl the Right Content

Last Edited September 29, 2016 by Garenne Bigby in Search Engine Optimization

steps needed for a good google crawl

How Does Google Find Your Website Content?

Through the use of software programs and algorithms, Google gathers web content so that those using Google to search can readily access the information that they are looking for. Because of this, there is very little effort required by webmasters to make sure that Google retrieves their content.

The process known as crawling is when Google collects all of the web content that is available to the public so that it can be displayed in search results for an appropriate search query. During this crawling process, Google's special software known appropriately as web crawlers, which will automatically discover and acquire websites. A web crawler works by pursuing web links from site to site, and then downloads the pages to store them to be used later. Sorting and analyzing is done by intricate algorithms, and then they are updated within Google's search engine results. Google's main web crawler is known affectionately as Googlebot.

Another process that Google uses to understand web pages is rendering. Rendering helps Google to interpret the way that web pages look and how they behave for visitors that are using different browsers and devices for their internet activities. Similarly to how a web browser will display a page, Google will retrieve the URL and then execute the code file given for that page—this is generally JavaScript or HTML. Google will then crawl every resource contained within the main code file in order to put together all of the visual aspects of the page and to get a better understanding of the website.

When Google is not able to render or crawl a web page, the site's visibility in Google's search results pages can be impacted. First, when Google is not able to crawl a website, it is impossible for them to gather any information about the website. This means that the site, or parts of the site, cannot be discovered in a natural way, therefore it cannot be relayed to Google users that are searching queries that are relevant to these web sites or pages.

Next, if Google is not able to render the web pages contained on a site, it will be a chore to try to understand the web content because important information regarding the visual layout is missing for the web page. If this were to happen, the site's content visibility can be greatly reduced within Google's search pages. Google takes action to render web pages in order to estimate how valuable the website is to varying audiences, and to establish where specific links will be shown within Google's search results pages. Luckily, there is a tool known as Fetch as Google that can help with diagnosing web page's crawling and rendering in order to improve the position of the site within Google's search engine results pages, and work to reach the web site's target audience.

Crawling & Rendering is Important

It is vital to the success of a website that it is able to be crawled and rendered in the correct way, ensuring that it will receive the best efforts from Google search. Though crawling and rendering is very important, it is also important to realize when blocking content from being crawled and rendered will improve the overall success of the website.

You should take the time to confirm that Googlebot and Google's other web crawlers have access to your website on the network level. It is vital that the URLs you would like Google to display in any search results are actually reachable by Google. Often times, URLs are actually blocked on purpose by website owners. Prior to blocking URLs, you need to ensure that this will not hide any content that you would like to be discovered by Google and subsequently displayed in their search results pages.

It is up to the website owner/designer to allow Googlebot access to all of the resources that are referenced on the website. Google considers all of the content that is not text as well as the total visual layout to decide where the website appears within the search results pages. The visual elements of the website help Google to totally understand the web pages. When Google understands a website the best that it can, it is able to better match the website to the individuals that are looking to find the particular content that it offers. After Google has retrieved the pages, Googlebot will run the code and decipher the content to better understand the overall structure of the website. The information that is collected by Google during the rendering process is used to rank the value and quality of the content compared to other websites and what other individuals are searching for using Google's search engine.

If there are web pages on your site that use code to arrange or display the content, Google has to properly render the content in order for it to be displayed in Google search. Many times, the meat of the textual content of a dynamic website may only be retrieved through the rendering of the web pages so Google is able to see the website like any other internet user would. If the website is going through faulty rendering, Google might not be able to retrieve any of the content. To bring this all full circle, when Google is not able to retrieve any of the content from a web page, it cannot know if the information and content within the website is relevant to any specific search queries, and will not show the site within search results.

Blocked Resources Report

Googlebot must have access to various resources on a web page so that it can render and index the page as needed. This includes things like image files, CSS, and JavaScript so that the bot is able to view the page like a normal user would. If a robots.txt file does not allow crawling of those resources, it will impact how well Google will render and index the page, thus impacting how the page is ranked in Google's search engine results.

The blocked resource report displays the resources that are utilized by the site, yet are blocked to Googlebot. Every resource is not shown, only the ones that Google assumes are under the webmaster's control.

  • The main report page will show a list of hosts that are providing resources on the site that are blocked. Some of these resources will be hosted on your own site while others will be hosted on other sites.
  • Select any host in the table to view a list of resources that are blocked from that host. There will be a tally of pages that are on your site affected by each of the blocked resources.
  • Select any of the blocked resources within the table to see a list of pages that will load the resource.
  • Select any page within the table that host a blocked resource to get instructions on how to unblock the resource, or follow the instructions below.

In order to unblock your resources, you will need to do the following:

  • Engage the Blocked Resources Report to view a list of hosts with blocked resources that are on your website. Begin the hosts that are owned by you, because you can update them on the robots.txt directly. It is possible that you will not have control of all hosts, but you will need to edit the ones that you can.
  • Select a host on the report to view a list of blocked resources that come from that host. From the list, begin with the ones that may be affecting your layout and content in a significant way.
  • For each resource that is affecting the layout, expand to view the pages that are using it. Click on any of the pages and follow the instructions. After this, fetch and render the page to ensure that the resource will appear.
  • Continue this process until Googlebot can access all of the previously blocked resources.
  • When you get the hosts that you do not own but have a strong impact on your site visually, contact their webmaster and ask them to unblock the resource from Googlebot. The other alternative is to get rid of your dependency on that resource.

Use Fetch as Google for Websites

Fetch as Google is a tool that will help crawl a web page. It enables any user to test how Google will render or crawl a URL within a website. The tool can be used to determine if Googlebot is able to access a page on the website, how it will render the page, and if any page resources are blocked by Googlebot. Essentially, it simulates a crawl and render process that is done as Google normally would, and is quite useful for ironing out any crawling issues that a website may be having.

Using Fetch as Google is pretty simple, and takes only a few steps to complete.

  1. In the text box, you will need to enter the path component of a URL on the site to be fetched, relative to the site root. When you leave the text box blank, the site root page will be fetched.
  2. You may choose the type of Googlebot that you wish to perform the fetch as.
  3. There are both desktop and mobile options. Which one that you choose will affect the crawler that is making the fetch.
  4. Choose to simply Fetch or both Fetch and Render.
  5. When you choose just Fetch, it will take action on a specific URL on the website and will show the HTTP response. This does not run any additional sources for the page. It is a fast process that can be done to diagnose any connectivity issues or security issues that the site may be having. The request will either succeed or fail.
  6. When you fetch and render, Google will do the same as described above and will then request and run all of the resources on the page. This will discover the visual differences between how a user will see the page and how Googlebot will see the page.
  7. After this, the request will be placed in the fetch history table, alongside a status of pending. After the request is done, you will be alerted of either the success or failure of the process along with other information. You may receive any additional details on successful requests by clicking on them.
  8. Google allows 500 fetches weekly. You will be notified when you are reaching the limit.


The last 100 fetch requests will be shown on the fetch history. You may choose to see the details of any completed request, and you may be shown a status of completed, partial, redirected, or specific error type.

If the request has been completed, that means Google contacted, crawled, and can get the resources that have been referenced by the page.

If the fetch status is partial, it means that Google was able to get a response from the site and has fetched the URL, but was not able to get the resources that were referenced by the page. This could happen if they were blocked by certain files. The process was fetch only, try to do a fetch and render. Look at the rendered page to see if there are any important resources that were blocked. If this is the case, unblock them on any robot.txt files that you own. For the ones that you don't own (if any) ask the owners to unblock them.

When you are shown a redirected status, you will have to follow it manually. If this is redirected to the same property, the tool will display an option to allow you to follow the redirect through the population of the fetch box via redirect URL. If the redirect is to another property, you will be able to click on the “Follow” option to auto populate the URL box. Copy the URL and paste it into the fetch box.

Garenne Bigby
Author: Garenne BigbyWebsite: http://garennebigby.com
Founder @dynomapper
Garenne Bigby is freelance Chicago developer and founder of DYNO Mapper with over 10 years experience in both agency and freelance roles in design, development, user experience, SEO, and information architecture.

Back

Related Articles

Create Visual Sitemaps

Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.

free 14 day trail button*No credit card required.

Popular Tags

Search Engine Optimization SEO Create Sitemaps Sitemaps Accessibility Testing UX Sitemap Generator Content Audit User Experience Website Content Audit

Private Beta

Are you interested in participating in Dyno Mapper's private beta period? We are currently selecting users so please fill out the form below to apply.

First Name*
Invalid Input

Last Name*
Invalid Input

Email*
Invalid Input

Occupation*
Invalid Input

How do you plan to use Dyno Mapper?*
Invalid Input

Submit