Orphan Pages

Analysing and monitoring orphan pages to improve internal website linking

Scanning the Orphan Pages

Understanding the importance of identifying Orphan pages let’s see the procedure to follow with the Seo Spider:

  • Setting up XML Sitemap Scanning

Preliminarily, you need to enable the “Crawl Linked XML Sitemaps” option to scan the sitemap of your website.

Config > Spider > Crawl > Crawl Linked XML Sitemaps

  • Connecting to the Google Analytics API

Configures the connection to the Google Analytics API, to find Orphan Pages from the organic search of a specific Account, Property or View.

Remember to choose the “Organic Traffic” segment, set and customize the date range to be analyzed (Data Range), which by default is one month, and define the metrics and dimensions of your interest.

Remember to enable the “Crawl New URLs Discovered in Google Analytics” option. If this option is disabled, URLs discovered via Google Analytics are only available in the “Orphan Pages” report. They will not be added to the scan queue, viewable within the user interface and will appear under the respective tabs and filters.

  • 3. Connecting to the Search Console API.

You will also be able to connect to the Google Search Console API to find all pages that despite having no internal link to the website are receiving Impressions and Clicks in a certain time interval

Again, it is possible to change the reference date on which to collect data. Similarly to Google Analytics, if this option is not enabled, new URLs discovered via Google Search Console will only be available in the “Orphan Pages” report. They will not be added to the scan queue, viewable within the user interface and will appear under the respective tabs and filters.

  • 4. Start the crawl and populate the data with Crawl Analysis.

Once you have completed the configuration step, all you have to do is run the scan and activate “Crawl Analysis” (if you have not already automated it previously from the configurations).

Completed these tasks you will be able to browse each tab and its respective ‘Orphan URLs’ filter to view all discovered orphan pages.

Let us put what we have just described into practice by scanning the Screaming Frog site.

We’re going to set it up to crawl the XML Sitemap and connect it to the Google Search Console API.

XML Sitemap

From the example above you can see that the Screaming Frog site has some orphan pages (Tab Sitemap) that were discovered by the XML Sitemap crawl.

In this case, the pages have a 404 and 301 status code and may represent old pages that were not removed from the XML Sitemap after the new portal was published.

API Search Console

By connecting the Screaming Frog site to the Search Console API (figure below) you can see that, unlike the previous case (analysis with sitemap.xml), orphan pages are still present in the website (status code 200) and receive Impressions from Google despite having no internal links.

Remember Well: both Google connectors allow you to export results individually simply with the “Export” button on the top window of the Seo Spider.

If you prefer to get an overview of all the Orphan Pages discovered by Google Analytics, Search Console and Sitemap.xml you can use the “Orphan pages” report, a document found in the main menu.

Crawl Depth Analysis

The last possible strategy to ferret out the presence of Orphan pages (after configuring the XML Sitemap, Google Analytics API and Search Console API) is to use the ‘Internal’ tab which includes every URL found during crawl and “Crawl Analysis.”

Pages not found through internal link crawling will always have an empty “Crawl depth.”

Orphan Pages Video Tutorial

Seo Spider Tab