An XML sitemap is a file that contains a list of a websites URL's and is used to inform major search engines about the structure of a websites content. The XML sitemap is a websites blueprint that clearly indicates where everything is located. The benefit of having a sitemap is that it allows the search engine web crawlers such as Googlebot to crawl through a website in a more logical and careful manner and also aids in a quicker crawl. Sitemaps are often overlooked website during website optimizations as not enough relevance is placed on the importance of them.
An XML file can be either created manually or by opting for a third party tool. Many of today's CMS products have plugins available that will build an XML sitemap automatically.
If deciding to manually create or develop a script to create an XML sitemap then the following considerations must be taken into account:
- Large XML sitemaps should be split into a series of smaller sitemaps, this will avoid the web server from becoming overloaded by serving too large a file to the search engines. The guidelines are that an XML sitemap cannot consist of more than 50,000 URL's and should not be any larger than 50MB uncompressed.
- If splitting large XML sitemaps use a sitemap index file to record individuals sitemaps and present this single file to the search engines as opposed to presenting individual sitemaps.
- Compressed sitemap files are supported by all major search engines. Sitemap files can be compressed using GZIP and usually end with the extension .gz.
- XML sitemap files should be UTF-8 encoded and URL's escaped suitably.
Example XML sitemaps
Standard XML sitemap
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://www.example.com/</loc> <lastmod>2015-11-02T09:01:06+00:00</lastmod> <changefreq>daily</changefreq> <priority>0.5000</priority> </url> <url> <loc>https://www.example.com/xml-sitemap/</loc> <lastmod>2015-11-02T09:01:06+00:00</lastmod> <changefreq>daily</changefreq> <priority>0.5000</priority> </url> </urlset>
XML sitemap index
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemap1.xml.gz</loc> <lastmod>2012-03-10T17:08:12+00:00</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemap/sitemap2.xml.gz</loc> <lastmod>2012-03-10T17:08:12+00:00</lastmod> </sitemap> </sitemapindex>
XML sitemap tags
The "Always" value is used in instances where pages change each time they are accessed while the "Never" value is used in instances where a URL is archived.
XML sitemap verification
Once an XML sitemap has been created it can be verified by checking it in Google Webmaster Tools. Before running the tool ensure that the sitemap is in the correct format and has been uploaded to the root of the webserver hosting it.
XML sitemap conclusion
In conclusion having an XML sitemap does not guarantee that all the URL's listed in the sitemap will be crawled or indexed, the reason being that search engine algorithms are intricate and convoluted. That said in a majority of cases having an XML sitemap is still the fastest and most effective way to get a website crawled and indexed.