XML Sitemap

XML Sitemap

An XML sitemap is a file that contains a list of a websites URL's and is used to inform major search engines about the structure of a websites content. The XML sitemap is a websites blueprint that clearly indicates where everything is located. The benefit of having a sitemap is that it allows the search engine web crawlers such as Googlebot to crawl through a website in a more logical and careful manner and also aids in a quicker crawl. Sitemaps are often overlooked website during website optimizations as not enough relevance is placed on the importance of them.

An XML file can be either created manually or by opting for a third party tool. Many of today's CMS products have plugins available that will build an XML sitemap automatically.

Sitemap considerations

If deciding to manually create or develop a script to create an XML sitemap then the following considerations must be taken into account:

  • Large XML sitemaps should be split into a series of smaller sitemaps, this will avoid the web server from becoming overloaded by serving too large a file to the search engines. The guidelines are that an XML sitemap cannot consist of more than 50,000 URL's and should not be any larger than 50MB uncompressed.
  • If splitting large XML sitemaps use a sitemap index file to record individuals sitemaps and present this single file to the search engines as opposed to presenting individual sitemaps.
  • Compressed sitemap files are supported by all major search engines. Sitemap files can be compressed using GZIP and usually end with the extension .gz.
  • XML sitemap files should be UTF-8 encoded and URL's escaped suitably.

Example XML sitemaps

Standard XML sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/</loc>
<lastmod>2015-11-02T09:01:06+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5000</priority>
</url>
<url>
<loc>https://www.example.com/xml-sitemap/</loc>
<lastmod>2015-11-02T09:01:06+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5000</priority>
</url>
</urlset> 

XML sitemap index

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2012-03-10T17:08:12+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap/sitemap2.xml.gz</loc>
<lastmod>2012-03-10T17:08:12+00:00</lastmod>
</sitemap>
</sitemapindex> 

XML sitemap tags

Tag
Required
Description
<urlset>
Yes
Contains all information pertaining to the collection of URL's recorded in the XML sitemap.
<url>
Yes
Contains all information about a particular URL.
<loc>
Yes
States the URL. In the instances of images and video's it states the landing page or play page. Maximum 2048 characters.
<lastmod>
Optional
Indicates the date the URL was last changed. Use YYY-MM-DD format.
<changefreq>
Optional
Gives an indication to how often the page will effectively be altered. Valid values are:

Always
Hourly
Daily
Weekly
Monthly
Yearly
Never

The "Always" value is used in instances where pages change each time they are accessed while the "Never" value is used in instances where a URL is archived.
<priority>
Optional
Illustrates the importance of a URL compared to the remaining URL's on the site. The importance will vary from 1.0 being of the most importance to 0.1 being of the least importance. It should be observed that the priority tag does not have any relevance to your site ranking in Google or other search engine search results.

XML sitemap verification

Once an XML sitemap has been created it can be verified by checking it in Google Webmaster Tools. Before running the tool ensure that the sitemap is in the correct format and has been uploaded to the root of the webserver hosting it.

XML sitemap conclusion

In conclusion having an XML sitemap does not guarantee that all the URL's listed in the sitemap will be crawled or indexed, the reason being that search engine algorithms are intricate and convoluted. That said in a majority of cases having an XML sitemap is still the fastest and most effective way to get a website crawled and indexed.