Nowadays, businesses are much more aware of the vital role SEO plays in helping them be found online. However, a number of factors can define how SEO-friendly any given website will be for web crawlers, and for a lot of people, all of these factors can get confusing and overwhelming. In an effort to break things down, we’ll look at two important factors that are often overlooked: crawlability and indexability. So what is crawlability? What about indexability? Let’s start by reviewing how search engines actually discover pages on the web, in order to gain a better understanding of these two terms.
Search engines use what is known as web crawlers to learn about new or updated pages on the web. These web crawlers are essentially bots that aim to find and index content. They will browse through the website similar to a person — going from link to link to bring data back to their servers. They’re able to find things the average person wouldn’t look for, such as alt tags, meta description tags, structured data, and other elements found in the code of the website. Ultimately, they involve the ability of a search engine to access and index pages on a website.
Crawlability refers to the search engine’s ability to crawl through content on a page.
Indexability refers to the search engine’s capability of analyzing and adding a page to its index.
Google may be able to crawl a site; however, indexability issues can stop it from being able to index each individual page.
If you haven’t looked into the technical side of SEO before, it’s very likely that your website isn’t crawlable and indexable. It’s important to know and be familiar with these terms, in order to ensure search engine robots are able to crawl and index the pages of your website. So how do you know if your website is crawlable and indexable? Chances are if you’re new to these terms, it’s not. But looking at your search engine rankings is a great way to tell. Are you showing up for searches related to your products and/or services?
If you’re not showing up for searches related to your products and/or services, but your content mentions your target keywords a few times on each page, you’re likely missing a few important aspects related to crawlability and indexability. If you haven’t optimized your website for crawlability and indexability, you’re likely missing out on the benefits of any off-page SEO efforts you’ve taken.
There are various factors that impact crawlability and indexability for any given website. Here are a few to keep in mind:
The structure of your website: Take a look at the structure of your website. Are you able to get to the main pages of your website from any given page? Most people make sure there are links to the main pages from the home page, but that’s not enough. You want people to be able to navigate easily from any area. Whenever possible, link to other relevant and authoritative websites as well.
Internal links to helpful information: If you have a service page or blog post that mentions a topic you’ve already written about elsewhere on your website, hyperlink the topic within that service page or blog post. This will allow crawlers to see that your content is interrelated — allowing them to better navigate and crawl through your website and increasing the likelihood of proper indexing.
Outdated or unsupported technologies: There are quite a few forms of website technology that are no longer crawlable by search engine bots. Make sure you’re not using anything outdated or unsupported, including but not limited to Ajax and JS, which actually prevent bots from crawling the website. In addition, check all of your programming languages to make sure they’re up-to-date.
Code errors preventing access to bots: Robots.txt is a text file created to instruct bots how to crawl on specific pages of the website. These text files can indicate to a search engine whether or not to crawl through by allowing or disallowing the behavior. In some instances, you may not want search engines to index a specific page. But if you do, you’ll want to make sure there are no code errors preventing it from happening.
Server errors and/or broken redirects: If you have server errors and/or broken redirects happening often on your website, your visitors will likely leave instead of continuing to try and make the page load. This will not only increase your bounce rate but also stop crawlers from being able to access your website content and index it. Make sure you resolve these problems immediately.
When you’re trying to help search engine bots crawl and index your website, the first step is reviewing the list of factors above that impact crawlability and indexability. You’ll want to take care of any issues preventing search engine bots from crawling and indexing your website before you move forward. Once you’ve taken care of the list of factors above, there are ways to improve crawlability and indexability.
Some people view sitemaps as a novelty, but this couldn’t be further from the truth. In fact, sitemaps have always been one of the most important web design best practices known to developers. A sitemap is essential for any given website — offering a vital link between the website and search engine. It’s important to construct your sitemap properly and make sure it’s well-structured. This will make your website easier to crawl, as well as provide users with more accurate search results when they’re looking for keywords associated with your products and/or services.
So what exactly is a sitemap? It’s a small file that resides in the root folder of your domain. It contains direct links to each and every page on your website. This essentially tells search engines all about your content, and whenever it’s updated, it’ll alert search engines to crawl through and review the updates. Once you’ve added a sitemap, make sure you keep it up-to-date.
While it’s important not to overlook the technical aspects of SEO, content is much more important than many people think. It’s a basic necessity for any given website, and fortunately, it has a lot of power when it comes to helping you rank higher in the search engines. When we refer to content, you might be thinking about traditional pages and blog posts. But that’s not the only form of content. You can go beyond text to include images, video, slides, audio, and much more. Content not only helps those visiting your website better understand what you do, but it also helps you ensure your website is crawled and indexed much quicker.
Why is that? The answer is simple: Google and other search engines spend more time crawling and indexing websites that are regularly updated. Try to make sure you’re writing content that is unique and long (around 2000 or more words) as the search engines will rank this higher than they’ll rank “thin” content. You want your content to satisfy searcher intent, which means you should use various forms of content that keeps people engaged and interested so they’ll stay on your website for a longer period of time.
Google and other search engines rely on links — both external and internal — within your website to determine what content is related to what, as well as the value of that content. They find your posts and pages best when they’re linked from somewhere on the web. Internal links, for instance, gives search engines more insight into the context and structure of your website. Essentially, crawlers will arrive at your homepage and follow the first link. They’ll make their way through your website — figuring out the relationship between various posts and pages to learn the subject matter.
Start by going through your website, page by page, and finding any mention of keywords, whether they’re short or long-tail. Link those keywords to relevant information on your website, including blog posts or service pages, to help improve your search ranking. This will show the search engines that all of your content is connected.
Page speed is commonly confused with site speed, but keep in mind, they’re quite different. Site speed, for instance, refers to the page speed for a sample of page views on any given site. Page speed, on the other hand, is specifically how long it takes to display the content on a particular page. Google offers a tool to determine your page speed. Typically, search engines only have limited time available to crawl and index any given site. This is referred to as the “crawl budget” of the bot. You want to make sure your page loads quickly, so the crawler is able to visit before they’re out of time.
In addition, if your page load time is high, visitors will leave the site rather quickly. Remember, there are tons of options out there for your products and/or services. We live in the digital age where virtually everything can be found online within a few moments. If your visitors leave your website quickly, your bounce rate will rise — letting search engines know that most people don’t find your content relevant, and thus, decreasing your search rankings.
Duplicate content is content that is found on the web in more than one place. Typically, it’s found in a location with a URL. Google and other search engines will have a difficult time figuring out which version of duplicate content is more relevant to any given search query, which is why it’s recommended to avoid duplicate content altogether. They don’t know which version(s) to include in their indices and whether or not they should direct the link metrics, such as authority, link equity, etc. to your version or another version.
At the end of the day, it’s best to avoid duplicate content. In addition to confusing the search engines, duplicate content can also decrease the frequency in which crawlers go through your website. If you use a syndicated blog service from your marketing company, have them make sure that content isn’t crawlable.
If you’ve completed the list of steps above and you’re still looking for more ways to improve crawlability and searchability, there are some more advanced techniques you can leverage. But of course, you’ll want to start with the basics listed above before you delve into more technical options.
A URL redirect, also known as URL forwarding, is a web server function that sends a visitor from the URL they’ve typed/visited to another. These are typically automated through a series of status codes that are defined within the HTTP protocol. These are commonly used in the event of a change of business name, a merger between two websites, an effort to split-test landing pages, and various other reasons. Each page should have no more than one redirect for the best possible results. If redirects need to be used, always use 302 for temporary redirects and 301 for permanent redirects.
Compression is used to allow your webserver to provide smaller sizes as they load much quicker for those who are visiting your website. Typically, gzip compression will be enabled as standard practice. If it’s not, your web pages are likely loading rather slow, especially compared to your competitors. The goal of enabling compression is to eliminate unnecessary data whenever possible. Whenever it’s not possible, try using a tool to compress your content/reduce your file size. There are many great tools available, such as Brotli, that allow you to achieve this.
For the average web page, images take up approximately 60% of the size. This can slow down your load speed significantly, so when possible, try to eliminate unnecessary image resources and make sure they’re always compressed, resized, and scaled to fit wherever they’re going. Some other best practices for using images properly include:
Use unique images that are relevant to the page
Aim for the highest quality format possible
Include an easy to understand caption with each image
Take advantage of “alt text” to ensure accessibility
Above-the-fold refers to anything a visitor will see as soon as they land on the page. This can include images, forms, text, and other content that’s meant to grab attention before they need to scroll down the page. Spend some time thinking about what you want people to see as soon as they get to your website. Remember, this spot is key to keeping them on the page. Put the most interesting or compelling information you have here. In addition, make sure you organize HTML markup to quickly render any content above-the-fold. This means you should make sure the content doesn’t exceed 148kB (compressed).
Page caching allows you to improve the load time of your web pages, which reduces your visitor’s bounce rate, and thus, improves your site for the search engines. Google has revealed that a half second difference in load times can reduce web traffic by up to 20%. For this reason, many search engines consider page load time an incredibly important factor for determining how to rank your website. Make sure you’re setting up a caching policy wherein you use browser caching to control how long a browser can cache a response. You can also use etags to ensure efficient revalidations are enabled.
There are many ways to improve the crawlability and indexability of your website. Keep in mind; your website should never be a set-it-and-forget-it type of asset. You need to continuously optimize and manage your website, in order to make the right improvements to move up in the search rankings. Once you’ve taken care of any issues stopping the search engines from being able to crawl and index your web pages, put the tips above into action to start seeing some real results.
Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.