Last Edited June 8, 2016 by
Garenne Bigby in Sitemaps
There are 9 sitemap formats that are supported by Google's Webmaster Tools. Sitemaps allow for a webmaster to inform various search engines about website links on a website that are available to be crawled. The sitemaps also provide information to webmasters about each URL in an XML file. The additional details include when the URL was last updated, how often it changes, and how important it is when compared to other URLs in the website. This will allow search engines to crawl the site more efficiently. There are varying ways to generate sitemaps depending on the type of website or content that is being posted. This is necessary because content audits use the sitemaps to crawl, thus providing information on how to make it better. Allowing the correct sitemap types not only lends to a more successful website crawl, but also to better success in search engines.
All of the information contained in the generated sitemap is what the search engines look for when providing internet users with search results. Whether the website is being built by a newbie or a professional webmaster, it is important to know which type of sitemap is ideal for the website (and content on it), and how to build it to reach its fullest potential.
These files will be used to submit pages from the website.
- This is the preferred web page format for submissions. Not all search engines support the sitemaps below, the XML sitemap for web pages can be supported by all search engines that are from sitemaps.org.
- When it is in its simplest form, it is an XML file that will list URLs for a website, along with all of the additional metadata associated with each URL. This will include when it was updated last, how often it changes, how important it is, and how it is relative to other URLs that are found on the site. This allows search engines to crawl the site more intelligently and more efficiently.
- These web crawlers discover pages from links within a site and then from other sites. The sitemaps will supplement data that will allow crawlers that support the sitemaps to pick out all URLs in the sitemap and then learn about those URLs using their associated metadata.
- Using a sitemap generator will not guarantee that search engines include the web pages, but it will provide insight on how to better crawl your site.
RSS 2.0 and Atom 1.0
The RSS 2.0 feeds are sometimes created automatically through many blogs.
- RSS stands for Really Simple Syndication, and it is a website content syndication format. It is a type of XML language.
- At the highest level, an RSS document is an RSS element, having a mandatory attribute called version that stipulates which version of RSS that the document will conform to. When it conforms to the specification, 2.0 must be the version.
Atom 1.0 feeds are XML based documents. The feeds are made up of “entries”, and each entry is made up of an extensive amount of metadata, and each entry has a title.
- Atoms are primarily used in blogs and headlines on news websites.
- Atom is used as an alternative to RSS, developed because RSS was thought to have flaws.
- The Atom format has been said to be cleanly and thoroughly specified, freely extensible to anybody, implemented by everybody, and absolutely vendor neutral.
- Many applications, including iTunes, support the use of Atom 1.0.
- Using the Atom format allows for elements to be used again outside of the context of a feed document that is not Atom.
- It is convenient to use Atom when links to resources and content contain characters that are out of the normal United States ASCII set of characters.
Text Files with Web Page URLs
When you are unable to create sitemaps from any of the above formats, you may create a text file that has your URL in it. The sitemap will have one web URL per line, and many search engines like Google and Yahoo are able to scan the text file sitemaps. To ensure that the sitemaps and search engines are compatible. Follow these steps:
- Text sitemaps should not contain any more than 50,000 URLs.
- For Yahoo, the primary text sitemap should be named urllist.txt.
- Text file sitemaps should be saved as UTF-8 documents. This is especially important if you are dealing in a website that has characters that are not English.
XML Sitemap Files for Video Search
Videos on your website can be indexed and made available to search on Google Video. The Google Video sitemaps furnish Google with metadata about the video and its contents on a website. Google's video site is the largest entity to search for videos on the internet and using the video sitemap, website owners can tell the search engine the category, the title, the description, the run time, and the audience intended for each video that is on the website. This helps the search engine gain knowledge of the rich video content on the website, and this will in turn improve the listing of the website when viewed on video search results.
- Video information (including URLs) is searchable when it is submitted as a separate sitemap or when it is included in a regular sitemap in the search engine's video search.
- The videos will then be displayed as a thumbnail of the video, along with its related information that has been pulled from the sitemap.
Media RSS Feeds for Video Search
mRSS feeds are an extension of RSS feeds, and the main difference is that it is specified that it is a media RSS extension in the sitemap. This is necessary so that the news reader applications will know that there is media contained on the feed, and they will in turn know how to interpret it.
- The tags in the feed will contain the descriptors such as “medium” and “item”. The medium would be an image, document, audio, or video.
- In the feed there may also be tags to give the viewers some more insight into the information that is contained. File size, type, height, width, and duration will be descriptors contained in the feed. Media will be seen again here, and there will also be “isDefault”, indicating whether or not this item will be the default item or the first item to be played.
- Title, description, and thumbnail will be seen and they are pretty self-explanatory.
- Optional tags to be included are: rating, keywords, copyright, player, credit, and text. The “text” tag can allow inclusion of a textual transcript (or closed captioning file) for the website.
XML Sitemap Files for Google Code Search
This functionality is when Google searches for source code files on the internet. This is useful for website owners whose websites feature source code, as they can create code sitemaps that help Google to index the code. A Code sitemap will look like a normal XML sitemap, but is does have some extra XML sitemap requirements and tags.
- Remember, when making a code sitemap, you must ensure that all relevant file extensions for code files are included.
- It is possible that you will need to also tighten the file name patterns that are accepted as code files.
- Both of these things can be done using output filters when you generate a sitemap.
XML Sitemap Files for Mobile Web Pages
This format allows developers to recognize content that is optimized for mobile devices, as there has been a small change in the format recently.
- A feature phone sitemap should not be created unless there is a specific feature phone version of a website designed for feature phones (not smartphones).
- It is possible to create a separate sitemap listing the video content, or add information about the video content to a sitemap that already exists. It is just a matter of convenience for the website maker.
- The mobile sitemap for feature phones utilizes the sitemap protocol with additional namespace and tag requirement.
- If choosing to use a sitemap generating tool, check that it can create sitemaps for mobile web pages.
- Do include the tag to ensure that the mobile URLs are crawled properly.
- List URLs that serve multiple markup languages in a single sitemap.
- Search Console will automatically detect and support XHTML mobile profile, WML, and cHTML.
XML Sitemap Files for Geo-Data
This is utilized when there is geographic data on your website that is in the form of GeoRSS or KML files.
- Search engines are improving and innovating new ways to use location data.
- If the search engine has the user's location, it will push the results in the location to the top of the page of search results of any items that are searched for.
- This happens even when the user does not enter their location into their search terms. This is why it is so important for website owners to have accurate location information incorporated into their website.
- The geo sitemap is a specific form of XML that will contain all of the geographic information for all of the locations.
- There are sitemap generators specific for geo-data. These are helpful for novice webmasters.
- Find the desired tool to create your geo sitemap.
- Fill in your business information correctly.
- Give the website details.
- You may now download both the KML and geo sitemap details, and then upload them to your website. Use an FTP uploader.
- ubmit the geo sitemap to your webmaster.
XML Sitemap Files for News
Websites designed for news can now submit their news content right through this special sitemap format. These users must first be registered with Google News before the files can be processed.
- The news sitemap should be "current"—it should only contain URLs for articles that have been published in the last 2 days. The articles older than 2 days can be removed from the news sitemap, but are going to remain in the index for News for 30 days.
- Web creators are encouraged to update their news sitemaps continually, and with fresh articles as they are published.
- When creating a news sitemap, it may contain no more than 1,000 URLs. To include more, you must break the URLs up into several sitemaps and then use a sitemap index file to aid in managing them.
- Use the XML format that is provided in the sitemap protocol. The sitemap index file that should not list over 50,000 sitemaps. The reason for this limit is to ensure that the web server is not overloaded when serving large files.
- When the news article is published, you should not be creating a news sitemap each time. You should be updating the current sitemap with the new article URLs.
- Do not use the Google Sitemap Generator to create a new sitemap, because this would include URLs that do not correspond to certain news articles. There are plenty of third party tools to help generate a Google news sitemap.
- Once the sitemap has been created, upload it into your highest-level directory containing your news articles.
It is up to the webmaster to either choose the correct sitemap format for a website, or to create a sitemap from a generator that will provide the correct type. When choosing the type of sitemap, the webmaster should take into consideration the type of content that will be posted on the website, along with any geo-data that is pertinent. Performing all of these things correctly will not only make the crawl easier, but it will help to ensure that the website it formatted properly and is found with accuracy through search engines. When a website owner is unsure of which sitemap format to use, they should review their content and choose the one that best works for them.
Build and submit a sitemap
The Ultimate Guide to XML Sitemaps