We have had a few users request 1 Million page crawler limits or more so I thought this would be a great topic to address in our blog. Whether you are creating a sitemap to improve your search engine optimization or creating it for website project planning purposes, there are size limitations to consider. According to sitemaps.org, the official protocol for XML sitemaps, they should not contain more than 50,000 website URLs and no larger than 10MB in size. Google, Yahoo, and Bing also follow these same size restrictions of XML sitemaps. If your site's page count is larger than this, you will need to use multiple sitemaps and link them from a sitemap index file which will identify each of the sitemap files for the search engines for more efficient indexing.
Similarly, a gigantic visual sitemap can also be hard to view if the sitemap contains many pages and levels. For these reasons, DYNO Mapper has several options to manage enterprise level websites, so we wanted to show you our process for creating multiple sitemaps.
Your first step would be to click Create from URL and do a general crawl of your website. It is best to do this even if you think you know your hierarchy of content. Let's face it; you could be missing something right under your nose.
Crawl your website to discover the hierarchy within your website. After crawling your website, you will want to make a plan for how you are going to split your sitemaps after a general overview of what you are working with. You will want each sitemap to contain no more than 50,000 pages and to contain sections of your website.
Each Sitemap should be split by your established website URL hierarchy. See the following website example broken into two sitemaps. If your website is organized by URLs that you can instantly group pages together within the next step.
After you have your plan together, it is time to create each of your sitemaps. Click Create from URL and open the Advanced Settings for the Omit Paths field.
After entering your URL, use the Omit Paths field to restrict directories. In the example above, you will want to omit services, locations, and members from the first sitemap.
Ex. \/services - will skip URLs with /services in it. \/services$ - will skip only a URL that ends with /services.
You will want to separate multiple rules with a new line.
Now create your second sitemap utilizing the omit function but this time you will want to exclude the root URL, directory, blog, and product directories. Entering \/$ will exclude the home page URL from the second sitemap.
Download both XML Sitemap files by going to each sitemap menu and clicking Download Sitemap then choosing XML Sitemap. Now all you need to do is upload your sitemaps to your hosting account and link them with a Sitemap Index file.
After you create a sitemap using DYNO Mapper, the crawler settings are stored so you can just refresh your sitemap by clicking "Refresh Sitemap" in the sitemap menu and re-upload to your server. You have the option to Schedule your sitemap crawl on a weekly or monthly basis. You can manage sitemaps for a 2,500,000 page website with the MOST POPULAR subscription using the above strategy, meanwhile seamlessly using our Content Inventory, Audit, Keyword Tracking, and Accessibility Testing features to create a phenomenal website user experience.
Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.