Blocked by robots.txt?

Find out how to analyse the effects of Robots.txt directives

Robots.txt and SEO

The robots.txt file is crucial for SEO as it allows you to tell Search Engines which pages of your site should be excluded from indexing, optimizing the visibility of the most relevant content. Proper configuration of the robots.txt can directly influence the site’s ranking in search results, allowing better control over Search Engine indexing and crawling. However, improper handling of this file could result in the accidental exclusion of important pages, affecting the overall ranking of the site. This scenario is very common during SEO migrations, in the “Staging” phase of publishing where , still on too many occasions it remains completely closed and inhibits Spider indexing in todo.

Copy to Clipboard

Robots.txt and Screaming Frog

As we have already seen with our guides, Screaming Frog allows you to painstakingly check the Robots.txt and test individual “Path URLs” directly from the console.
Through the SEO Spider, it is possible to respect (“Respect Robots.txt”) or disregard Robots.txt (“Ignore Robots.txt”) during crawl according to the set goal of the analysis.

Through the previous feature it is possible to check folders or paths individually to see if the Robots.txt creates blocks but it does not allow a “Bulk” analysis and this is very reductive.

Screaming Frog, through the“Blocked by Robots.txt” filter, of the“Status Code” tab allows a view of all URLs affected by the file, and through the export, a reference document can be downloaded for possible checks or analysis.

You probably already knew this option, but you may never have considered one of the columns available with this filter, the“Matched Robots.txt Line.”

Matched Robots.txt Line

In the analysis of the robots.txt file if everything is configured correctly with the previous filters you will find only the resources that you really wanted to block, but what if important pages that you otherwise wanted to index are included? More importantly, how can you untangle the case of an e-commerce with hundreds of thousands of Urls to find the possible bug in the compilation of the Robots.txt File?

In your aid Screaming Frog has provided the “Matched Robots.txt Line” column, which for each individual URL tells you which rule (directive line in the file) in the robots.txt file is blocking indexing.

For a very simplified view you can click on the column header and sort everything by reference line so that even when analyzing large websites or e-commerce sites the resolution of any configurations can be extremely simplified.

Seo Spider Tab