Remediate.Co

Sitemap Format

Sitemap Format

For a sitemap protocol to be appropriately recognizable, your XML file must contain some tags. Furthermore, the sitemap file is also required to be UTF-8 encoded. How this works will be explained shortly.

Basic Sitemap example

To get you started, below is a very basic sitemap file example that can even guide you to create your first sitemap. In this example, we have used a single URL:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://domain.com/</loc>
 
<lastmod>2008-01-01</lastmod>
<changefreq>daily</changefreq>
<priority>0.1</priority>
</url>
</urlset> 

 

There are more complicated examples of sitemaps that you can find elsewhere in this tutorial,

XML tags

To enable you understand better, the lines of the sitemap file have been revised below, one after another.

  • Opening Tag: You must start every Sitemap XML file with an opening tag <urlset> and end it with </urlset>.
  • Parent Entry: Start each “parent” entry with <url> tag and end it with </url>.
  • Child Entries: Similarly, each “child” entry should be contained between <loc> and </loc> tags.
  • URL: Insert a URL that should start with http:// just after a <loc> tag.

**Note that the URL shouldn’t exceed 2048 characters.

  • A date in the format has to be inserted to the <lastmod> tag.

It is important to note that this tag doesn’t require modification every time the document is changed. The dates will be available whenever the search engines crawl the documents.

  • To determine the frequency in which a page is modified and how frequent it should be indexed, the <changefreq> tag is used to hint to the crawlers.

Bear in mind that the crawl bot behavior is not determined by the value indicated as it depends entirely on the search engine.

One of the following values are expected by the <changefreq> tag; never, yearly, monthly, weekly, daily, hourly or always.

It is important to note that “always” should only be used for pages that are thoroughly created or modified upon each access. On the other hand, the “never” value doesn’t necessarily mean that the page will never be indexed; it sure will, like once a week.  

  • The <priority> value can range from 0.0 to 1.0

Priority value only indicates personal preferences on how you would like your site indexed. Pages that have not been prioritized by default have the value set at 0.5. This means that pages with higher priority value will be indexed first in descending order.

Note that priority value is relative and therefore should only be used for the website and not every page. Even if you were to prioritize all pages, there is no guarantee that they will be indexed more often because the value is not used to compare one website to another.

Exceptional characters in the Sitemap file

As earlier discussed, every sitemap has to be UTD-8 coded, which can easily be done upon saving the sitemap file. UTF-8 format is compatible with nearly all text editors.

Entity escape codes for the following characters should be used in all data in the sitemap.

Character Escape Code

  • Ampersand & &amp;
  • Single Quote ' &apos;
  • Double Quote " &quot;
  • Greater Than > &gt;
  • Less Than < &lt;

Finally, always remember that the overall size of your sitemap shouldn’t exceed 10MB. You can easily create sitemaps with the help of our website mapping tool, http://dynomapper.com/

Create Interactive Visual Sitemaps

Discovery has never been easier.

Sign up today!