Learning About Sitemaps

Posted July 1, 2016 by Garenne Bigby in Sitemaps

learning about sitemaps

A sitemap is a file that contains a list of many web pages that are contained in a single website. It is used to tell search engines about the site content and how it is organized. Web crawlers used by search engines will read the file in order to gain more knowledge about the website, and crawl it more effectively. The sitemap will also provide very valuable metadata that is associated with the web pages that are listed on the sitemap. Metadata is data about the webpage—this would include information like how often the web page is changed, when it was last updated, and its importance in relation to the other pages in the website.


Metadata can be provided to search engines through the sitemaps, and can include information about specific content types on the web page like images, videos, and mobile content. As an example, metadata for a video would include things such as the running time, age rating, size, and category. Metadata for an image would include information such as the type, subject matter, and license. This information for the search engine crawlers is found within the sitemap.

A Brief History

Google introduced the first sitemaps in the middle of 2005 so that web developers had a platform to publish a list of links from across their websites. Other search engines announced that they would join in support of sitemaps in late 2006. In April of 2007, a few more search engines came on board with sitemaps, while the larger named search engines announced their support for auto-discovery through robots.txt. The following month, Arizona, Utah, Virginia, and California announced that their state governments would be using sitemaps for their websites.


The idea behind sitemaps is that they will aid in discovery of websites on the internet. Crawler-friendly web servers have the idea that improvements like auto-discovery will aid in the ability to sort through the priority of web pages.

How Necessary is a Sitemap?

If a website's pages are linked properly, search engine website crawlers usually can discover most of the website on their own. Even when this is the case, having a sitemap in place will improve the accuracy of a website crawl, especially if:

  • The website is very large. When this is the case, sometimes a search engine web crawler will overlook some parts of the website that are new or recently updated.
  • The website is new and has very little external links. Search engine website crawlers investigate the internet by following links between web pages. Because of this, a crawler may not discover web pages if they are not linked from other websites.
  • The website has an archive that is large and filled with content pages that are not linked. When web pages are isolated or do not reference each other, they can be listed on a sitemap to make sure that search engines do not overlook any of the pages on the website.
  • Content on the website is made of rich media or uses other annotations that are sitemap-compatible.


Search engines rely on complex algorithms to crawl websites, so having a sitemap won't guarantee that all items on a sitemap will be indexed and crawled. In most cases, the website will benefit from having the sitemap, and there is never a negative effect for having one.

Building and Submitting a Sitemap

First, you must decide which pages on the website will be crawled by search engines, and then determine the URL that visitors will see. After this, determine the format in which the sitemap will be—the site map can be created through third party tools or manually. Once the sitemap has been generated, it will need to be tested with a sitemap testing tool. Once the test has been successful, the website (along with the sitemap) should be made available to search engines.

Formats for Sitemaps

All sitemap formats are limited to an uncompressed 10MB and 50,000 URLs. When the file is larger or contains more than 50,000 URLs, it will be necessary to break the list up into multiple sitemaps. There will also be an option to create an index file for the sitemap. This is a file that will direct crawlers to a list of sitemaps. The index can then be submitted to search engines for crawling.

Sitemaps can be made into an array of formats, like XML, RSS (including mRSS, and Atom 1.0), Text, and even Google sites. It is up to the skill of the webmaster to determine which is right for their needs.

General Guidelines for Sitemaps

Because search engines will crawl the URLs exactly how they are listed, use only consistent and fully qualified URLs.

  • Get to know a search engine's webmaster guidelines and their SEO guides if a consultant will be helping to optimize your sitemaps.
  • Do not include any session IDs in URLs. If these are included in the sitemap, it could lead to crawling duplication.
  • When there are multiple languages used in the sitemap, list all of the translated versions of the URLs to the search engines. This will help with the crawling and indexing.
  • The sitemap file must be UTF-8 encoded, and the URLS should be escaped properly.
  • Don't forget to break up large sitemaps to prevent the server from being overloaded when the search engines frequently request the sitemap.
  • A sitemap index file should be used to list all of the sitemaps for search engines. This enables only one file to be submitted, rather than each individual sitemap.
  • Canonicalize URLs with the search engines to ensure that both the non-www and www version of the domain is accessible.
  • The use of non-latin and non-alphanumeric characters can be a bit tricky. A sitemap URL may not contain any characters that are not ASCII. If it does contain them, you will get an error code when you try to add it to be crawled.

Flash Sites and XML Sitemaps

The issue with websites that are rich in Flash is that they are essentially built with non-HTML language. A search engine would likely only be able to find the homepage of the site if the navigation is built with flash, and all pages linked to the home page would not be able to be discovered without an XML sitemap. XML coding is more precise than HTML, as errors are not tolerated so it is vital that syntax be exact. When dealing with an XML code, it is strongly advised to consult an XML syntax validator. There are many free and available online with a simple search.

XML sitemap generators are available online for websites that are more complex. This also comes in handy when a sitemap is quite complex, as the coding can get a bit intense when one is not seasoned in it properly.

Managing Sitemaps and Sitemaps Report

You will need to manage your sitemaps by adding, viewing, and testing them but won't need to do this if they have been created with a managed hosting service. Generally a service like this will not only create, but manage sitemaps for you.

When using a sitemap report, you will see a list of sitemaps that have been submitted. Do note that only the sitemaps that have been submitted using the specific tool you chose will show up on the list. The report will not list the sitemaps that have been submitted through other means.

When your Sitemap Isn't Showing Up in a Report

Sometimes, for various reasons, a sitemap won't show up in a report. This can be the byproduct of several things.

  • Who was the sitemap submitted by? If you are using a service that has the option, there is a tab to view those submitted by the user, and those submitted by anyone else.
  • What is the website's preferred domain? Make sure that you have typed in the correct URL, with no typos.

Testing the Sitemap

All sitemaps should be tested before they are submitted. When testing a sitemap that has been submitted previously, locate the details page of the sitemap and click on Test.

When testing an un-submitted sitemap, locate the Add/TEST button on the landing page for reports. You will then enter the URL for the sitemap inside of the dialogue box exactly as it appears, and then click on Test. Once the test has been completed, click on the Results area to check for any errors. Fix any errors that were found, and then submit the sitemap.

Submitting a Sitemap for the First Time

Testing the sitemap prior to submission is vital. When the sitemap has been tested successfully, enter the URL into the submission box. The URL needs to be relative in relation to the site root that is defined for the property. After it has been submitted, refresh the browser to see that the new sitemap has been added to the list.

It will take some time for a search engine to process a new sitemap that has been submitted. Sitemaps can also be resubmitted in the event of an error.

Simplifying Management of Multiple Sitemaps

When a user has many sitemaps, it is possible to use an index file to submit them all at once. The XML format for a sitemap file is similar to the XML file format for a sitemap index file. A sitemap index file will use these XML tags:

  • Sitemap: this is the parent tag for each sitemap that is listed in the file; a child of a sitemap index tag.
  • Sitemapindex: this is the parent tag that surrounds the file.
  • Lastmod: the date that the sitemap was last modified, optional.
  • Loc: this is the location of the sitemap; also a child of a sitemap tag.

When you are the owner of more than one website, you may want to simplify the process of creating then submitting sitemaps. One or more sitemap(s) can be created that will include URLs for all of your websites, and then save the sitemap/sitemaps in a single location.

You must ensure that you have complete and verified ownership of all sites that are being linked to the sitemaps.

When are Sitemaps Beneficial?

Using a sitemap can really only either have no effect on or benefit a website—it will never be detrimental to the overall performance or function of a website, ever. There are, however, certain situations when having a sitemap is particularly beneficial.

  • When the website is extremely large and there may be a chance that a web crawler will overlook recently added or new content.
  • When there are many pages that are contained on a website that are not well linked together or they are totally isolated.
  • When the webmaster has used content that is not generally processed by search engines. This would be content from Flash, Silverlight, or Ajax.
  • When a website has very few external links leading to it.

More Information

When a sitemap has reached the limit of 50,000 URLs or 50MB, it may be compressed using gzip. This will reduce its bandwidth consumption. A sitemap index file may also not contain any more than 50,000 sitemaps and can be no larger than 10MB, but it may also be compressed. A sitemap index file may open up as an entry point for multiple sitemap files.

When sitemaps are submitted to a search engine directly, this is called pinging. When a sitemap is pinged, it will return information regarding the status along with any errors in processing. Each search engine will produce different results as it is submitted, and the sitemap location may be included in any robots.txt file simply by adding sitemap: <sitemap_location> to the robots text file. The “<sitemap_location>” will need to be the complete link to the sitemap. This is independent of the user-agent text line, so it may be placed anywhere in the file.

In short, a sitemap is a file that shows the relationship between pages and other content elements on a website. It shows the shape of the information space in an overview, and will help with organization, navigation, and labeling on a website.

Garenne Bigby
Author: Garenne BigbyWebsite: http://garennebigby.com
Founder @dynomapper
Garenne Bigby is freelance Chicago developer and founder of DYNO Mapper with over 10 years experience in both agency and freelance roles in design, development, user experience, SEO, and information architecture.

Back

Private Beta

Are you interested in participating in Dyno Mapper's private beta period? We are currently selecting users so please fill out the form below to apply.

First Name*
Invalid Input

Last Name*
Invalid Input

Email*
Invalid Input

Occupation*
Invalid Input

How do you plan to use Dyno Mapper?*
Invalid Input

Submit