A sitemap is a file that contains a list of many web pages that are contained in a single website. It is used to tell search engines about the site content and how it is organized. Web crawlers used by search engines will read the file in order to gain more knowledge about the website, and crawl it more effectively. The sitemap will also provide very valuable metadata that is associated with the web pages that are listed on the sitemap. Metadata is data about the webpage—this would include information like how often the web page is changed, when it was last updated, and its importance in relation to the other pages in the website.
Metadata can be provided to search engines through the sitemaps, and can include information about specific content types on the web page like images, videos, and mobile content. As an example, metadata for a video would include things such as the running time, age rating, size, and category. Metadata for an image would include information such as the type, subject matter, and license. This information for the search engine crawlers is found within the sitemap.
Google introduced the first sitemaps in the middle of 2005 so that web developers had a platform to publish a list of links from across their websites. Other search engines announced that they would join in support of sitemaps in late 2006. In April of 2007, a few more search engines came on board with sitemaps, while the larger named search engines announced their support for auto-discovery through robots.txt. The following month, Arizona, Utah, Virginia, and California announced that their state governments would be using sitemaps for their websites.
The idea behind sitemaps is that they will aid in discovery of websites on the internet. Crawler-friendly web servers have the idea that improvements like auto-discovery will aid in the ability to sort through the priority of web pages.
Search engines rely on complex algorithms to crawl websites, so having a sitemap won't guarantee that all items on a sitemap will be indexed and crawled. In most cases, the website will benefit from having the sitemap, and there is never a negative effect for having one.
First, you must decide which pages on the website will be crawled by search engines, and then determine the URL that visitors will see. After this, determine the format in which the sitemap will be—the site map can be created through third party tools or manually. Once the sitemap has been generated, it will need to be tested with a sitemap testing tool. Once the test has been successful, the website (along with the sitemap) should be made available to search engines.
All sitemap formats are limited to an uncompressed 10MB and 50,000 URLs. When the file is larger or contains more than 50,000 URLs, it will be necessary to break the list up into multiple sitemaps. There will also be an option to create an index file for the sitemap. This is a file that will direct crawlers to a list of sitemaps. The index can then be submitted to search engines for crawling.
Sitemaps can be made into an array of formats, like XML, RSS (including mRSS, and Atom 1.0), Text, and even Google sites. It is up to the skill of the webmaster to determine which is right for their needs.
The issue with websites that are rich in Flash is that they are essentially built with non-HTML language. A search engine would likely only be able to find the homepage of the site if the navigation is built with flash, and all pages linked to the home page would not be able to be discovered without an XML sitemap. XML coding is more precise than HTML, as errors are not tolerated so it is vital that syntax be exact. When dealing with an XML code, it is strongly advised to consult an XML syntax validator. There are many free and available online with a simple search.
XML sitemap generators are available online for websites that are more complex. This also comes in handy when a sitemap is quite complex, as the coding can get a bit intense when one is not seasoned in it properly.
You will need to manage your sitemaps by adding, viewing, and testing them but won't need to do this if they have been created with a managed hosting service. Generally a service like this will not only create, but manage sitemaps for you.
When using a sitemap report, you will see a list of sitemaps that have been submitted. Do note that only the sitemaps that have been submitted using the specific tool you chose will show up on the list. The report will not list the sitemaps that have been submitted through other means.
All sitemaps should be tested before they are submitted. When testing a sitemap that has been submitted previously, locate the details page of the sitemap and click on Test.
When testing an un-submitted sitemap, locate the Add/TEST button on the landing page for reports. You will then enter the URL for the sitemap inside of the dialogue box exactly as it appears, and then click on Test. Once the test has been completed, click on the Results area to check for any errors. Fix any errors that were found, and then submit the sitemap.
Testing the sitemap prior to submission is vital. When the sitemap has been tested successfully, enter the URL into the submission box. The URL needs to be relative in relation to the site root that is defined for the property. After it has been submitted, refresh the browser to see that the new sitemap has been added to the list.
It will take some time for a search engine to process a new sitemap that has been submitted. Sitemaps can also be resubmitted in the event of an error.
When you are the owner of more than one website, you may want to simplify the process of creating then submitting sitemaps. One or more sitemap(s) can be created that will include URLs for all of your websites, and then save the sitemap/sitemaps in a single location.
You must ensure that you have complete and verified ownership of all sites that are being linked to the sitemaps.
When a sitemap has reached the limit of 50,000 URLs or 50MB, it may be compressed using gzip. This will reduce its bandwidth consumption. A sitemap index file may also not contain any more than 50,000 sitemaps and can be no larger than 10MB, but it may also be compressed. A sitemap index file may open up as an entry point for multiple sitemap files.
When sitemaps are submitted to a search engine directly, this is called pinging. When a sitemap is pinged, it will return information regarding the status along with any errors in processing. Each search engine will produce different results as it is submitted, and the sitemap location may be included in any robots.txt file simply by adding sitemap: <sitemap_location> to the robots text file. The “<sitemap_location>” will need to be the complete link to the sitemap. This is independent of the user-agent text line, so it may be placed anywhere in the file.
In short, a sitemap is a file that shows the relationship between pages and other content elements on a website. It shows the shape of the information space in an overview, and will help with organization, navigation, and labeling on a website.
Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.