Understanding sitemap-index.xml: What It Is and How It Works
A sitemap index file lists multiple sitemap files to help search engines discover all pages on large sites. This guide covers its structure, generation, and best practices.
What is sitemap-index.xml
A sitemap-index.xml is an XML file that acts as an index of multiple sitemaps. It helps search engines discover all the sitemap files used on a large site by listing their locations in one place. The format is defined by the sitemaps protocol (sitemaps.org).
How it fits with sitemaps
When a site has many URLs spread across several sitemap files, a sitemap index points to each of those files. Search engines fetch the index first, then retrieve each referenced sitemap as needed. This keeps crawling efficient and organized for very large sites.
Structure of a sitemap index
A sitemap index uses the root element sitemapindex and contains multiple sitemap entries. Each entry points to a separate sitemap file.
Elements of a sitemap index
- sitemap: container for a single sitemap entry
- loc: the absolute URL of the sitemap file
- lastmod: optional date when the sitemap was last modified
Example
<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
<sitemap>
<loc>https://www.example.com/sitemap-posts.xml</loc>
<lastmod>2024-06-01</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-pages.xml</loc>
<lastmod>2024-06-01</lastmod>
</sitemap>
</sitemapindex>
How to generate
- Use a CMS plugin or a script that collects sitemap URLs and writes a sitemapindex file referencing them with <loc> entries.
- Each sitemap file can contain up to 50,000 URLs or be up to 50 MB uncompressed.
- The sitemap index itself can reference up to 50,000 sitemap entries.
- Ensure each URL is absolute and reachable, and update lastmod values to reflect changes when appropriate.
- Consider gzip compression for large sitemap files and place the index at a stable location.
Best practices
- Keep the number of sitemaps under the recommended limits to avoid crawl delays.
- Submit the sitemap index URL to major search engines (e.g., Google Search Console, Bing Webmaster Tools).
- Reference the sitemap index in your robots.txt when feasible and ensure it is easy to discover.
- Regularly update lastmod values and remove outdated sitemaps to prevent crawl errors.
Submitting to search engines
- Google: add the sitemap index URL in Google Search Console and let Google fetch all referenced sitemaps.
- Bing: submit via Bing Webmaster Tools; Bing will resolve the referenced sitemaps similarly.
- Maintain visibility by ensuring all referenced sitemaps remain accessible and well-formed.
Common pitfalls
- Not updating lastmod when content changes, which can mislead crawlers.
- Referencing broken or moved sitemap URLs.
- Placing the index behind authentication or blocking it in robots.txt.
- Forgetting to compress large sitemap files.
Quick takeaways
- A sitemap-index.xml lists multiple sitemap files, making it easier for search engines to crawl large sites.
- Keep URLs absolute, up-to-date, and accessible; submit the index to search engines for best results.
- Monitor for errors and update the index as your site grows.
Share This Article
Spread the word on social media
Anne Kanana
Comments
No comments yet. Be the first to share your thoughts!