In the process of optimizing websites, an extremely important part of the puzzle is submitting the sitemap. The purpose of the sitemap is to ensure that search engines will discover all pages contained on the website, and will download them quickly when changes have been made. Here, you will discover more about why sitemaps are important, how to optimize them for search engines, and when to use an XML sitemap and RSS or Atom feed.
A sitemap can be in an XML, RSS or Atom format. It is important to note the difference between the formats. And XML sitemap will describe the entire set of URLs within a website, while an RSS or Atom feed will only describe the recent changes. The implications of this are that:
To ensure the most optimal crawling of your website, it is recommended to use both an XML sitemap as well as an RSS or Atom feed. The XML sitemap is made to give Google information about all of the individual pages on the website, while the RSS or Atom feed will provide all new updates to the website to the search engine, and will help Google to keep the content fresh within the index. It should be noted that submitting a sitemap or a feed will not guarantee the indexing of the URLs.
Essentially, XML sitemaps and RSS Atom feeds are list of URLs that are attached to some form of metadata. The two most vital piece of information for Google are the specific URL as well as the last time of modification.
In an XML sitemap and RSS feed, you must specify a time of last modification for each URL in the sitemap. This modification time needs to be the last that the content contained in the page was changed meaningfully. If a change in the content is meaningful enough to appear in the search results, then the time of this modification is what needs to be present in the sitemap.
Do not forget to update or set the last modification time in the correct manner. The correct format is the W3C date time for XML sitemaps. Only modify this time when there has been a purposeful change in the content. Do not make the mistake of sending the last modification time to the most current time when the sitemap has been serviced.
The sitemaps will contain URLs of all webpages on your website. Many times, they are large and not updated frequently. To maximize your XML sitemap, follow these guidelines:
<urlset>: this tag is required, and is the document-level element within the sitemap. The remainder of the document after the <?xml version> tag (or element) has to be contained within this.
<url>: this should go without saying, but this element is vital and definitely required. This is the parent tag for each individual entry.
<sitemapindex>: also required, and is a document-level element included in the sitemap index. The remainder of the document after the <?xml version> tag should be contained in this as well.
<sitemap>: definitely required. This is the parent element for each individual entry contained inside of the index.
<loc>: required as well, this element will provide the full URL of the sitemap or page including the protocol and trailing slash (if this is required by the website's hosting server). This value needs to be no more than 2,048 characters. Do know that any ampersands in the URL must be expressed as &
<lastmod>: though not required for an XML sitemap, this will show the date in which the file was last modified. It may be displayed in the full date and time mode, or just in the date format.
<changefreq>: this is not required, but it tells how frequently the web page might change, such as always, hourly, daily, weekly, monthly, yearly, or never. When choosing “always”, this means that the documents will change each time that the website is accessed. “Never” is used when the files are archived, meaning that they will not be changed again in the future. This element is utilized only as a guide for crawlers, and does not determine how frequently a website is indexed, and it does not apply to <sitemap>.
<priority>: not required, but this element will display the priority of a specific URL in relation to other URLs on the website. This element will allow any webmaster to suggest to crawlers which page may be more important. The range is valid from 0.0 to 1.0, where 1.0 is the most important. The default value for this element is 0.5. It should be noted that an attempt to rate all pages on a website as high priority will not affect their listing in the search engines, as it only suggests this to crawlers and how important the pages are in relation to one another on a single website. This does not apply to the <sitemap> elements.
Support for the required elements is widespread, while support for those that are not required will vary across each search engine.
These feeds should show the most recent updates to your website. Generally, they are small and frequently updated. Also recommended for these feeds:
Using both XML sitemaps as well as RSS and Atom feeds is a great way to positively modify the crawling of a website for search engines, including Google. The vital information contained in these files is the canonical URL as well as the last time the pages were modified within the website. When both of these elements are used properly, they will notify search engines through the sitemap pings and feed hubs. This all allows the website to be crawled with optimum accuracy, therefore it will be accurately represented within the search results.
When a website uses both XML sitemaps and RSS or Atom Feeds, it provides maximum coverage and extended discoverability for search engines. XML sitemaps need to contain only canonical URLs for the site, while the feeds will only contain the latest additions or the URLs that have been recently updated. Canonical URLs are the URLs that visitors will see. Many times the canonical URL will be used to describe the website's homepage.
One may wonder when exactly they should use both XML sitemaps and RSS or Atom feeds for their website. The benefit is that Google will prioritize new or recently updated URLs on your website. Google has noted that by employing RSS, they can be more efficient at keeping their index fresh.
Both protocol and subdomains can impact how URLs contained in a sitemap get indexed and crawled. The URLs that are included in XML sitemaps have to use the same protocol and subdomain that the sitemap is using. To be precise, the https URLs that are located inside of an http sitemap will not be included in the sitemap. This is in the same vain that a URL on example.domain.com will not be located on the sitemap for www.domain.com. This problem is seen in many websites that employ many subdomains or they have sections that are prefaced with http and https, like an ecommerce site. Many sites have started to change all URLs to https, but do not change the XML sitemaps to reflect this change. It is recommended to check any XML sitemap whose website has been changed recently.
There will be times in which a website will have pages that have different languages. When this is the case, many webmasters use hreflang. Using this tag, it makes it possible to tell Google which pages are to target which languages. Google can then surface the right pages based on the language or country of the person that is searching Google. It is possible to either provide the hreflang code in the page's HTML code—page by page, or you may use the XML sitemap to supply the hreflang code.
Testing the XML sitemaps or other feeds can happen through Google's Webmaster Tools. There is a simple button that is used for this process. This functionality in the webmaster's tools can find any problems quickly and efficiently.
When choosing to incorporate the RSS or Atom feeds in with a sitemap, these syndication feeds will supplement the complete sitemap, as all new information is updated into these while being crawled.
XML sitemaps need to be tested thoroughly before they are actively implemented to make sure that they will run smoothly once they go live. Many people choose to do this by crawling their own sitemaps. While doing this, it is possible to identify any tags that will give problems, any header codes that are non-200, and other issues that may have been overlooked. There are websites available to crawl a sitemap for the webmaster, it is just up to the user to determine how they would like to do this.
XML sitemaps can be used in many ways to maximize the SEO efforts of a website. When you understand exactly how and why XML sitemap work, it enables you to inform the search engines of all relevant URLs on the website, and how to use them versus an RSS or Atom Feed. XML sitemaps are fed directly to search engines, so it is vital that they are done right before they go live—especially for those websites that are larger or more complex. Ideally, a webmaster would choose to implement both an XML sitemap as well as a syndication feed in order to ensure that their website has the best structure possible, as well as to ensure that all new content is able to be discovered through the search engines.
Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.