How to prepare the website's xml sitemap for crawler analysis.

Sitemap scan

The tab collects all the most significant data extracted from the scanning of an XML Sitemap.

To get the data you have to enable “Crawl Linked XML Sitemaps” in “Configuration > Spider” , start the crawl and perform a subsequent “Crawl Analysis”.

The columns available in the tab include general information about Urls and their indexability.

Much more interesting and specific are the dedicated filters:

  • URLs in Sitemap: URLs that are present in an XML Sitemap are displayed. At Seo level these elements should contain only indexable and canonical versions avoiding pages with status code other than 200, blocked by robots.txt or canonicalized.
  • URLs Not In Sitemap: identifies all URLs that were found during the crawl but do not belong to the XML Sitemap. In this case, I always recommend a check to see if these URLs were intentionally left out or are their own shortcomings and should be added. This filter does not consider non-indexable URLs, assuming them to be conscious choices, and therefore does not report them.
  • Orphan URLs: displays URLs that are present in the XML Sitemap, but were not discovered during crawling highlighting a potential internal link problem.
  • Non. Indexable URLs in Sitemap: URLs that are in an XML Sitemap, but for some reason are not indexable. In this case there can be essentially two optimizations: first remove the pages from the Sitemap or, if the “status” “non-indexable” represents an error go fix it.
  • URLs in Multiple Sitemaps: URLs that are in more than one XML Sitemap.
    You may want to create a sitemap index.
  • XML Sitemap with Over 50k URLs: identifies XML Sitemaps that have more than 50k URLs in contradiction to the limits imposed by the Search Engine. The solution is to create a “sitemap index” to send via Search Console to Google.
  • XML Sitemap with Over 50mb: identifies any XML Sitemap that has a file size greater than 50 Mb.

Video Sitemap Analysis

Creating a Sitemap

Seo Spider Tab