INTERNAL TAB

The Internal tab, the first and most comprehensive tab available on Screaming Frog for consulting the Seo Spider.

INDEX:

Overview
“Internal” tab

On the Seo Spider console, the first tab available to you is “Internal” and combines all URLs that are on the same subdomain as the initial crawl page. This tab includes all Seo Spider metrics, except for the data collected by the “external,” “hreflang,” and “structureddata” (data structured) tabs. Any subdomains can be rendered as Internal (and not otherwise considered as “external referrals”) if you use the “crawl all subdomains” configuration, the “list” mode, or the CDNs feature.

Status Code

In this tab you can consult the following data:

Address: identifies the address (URL) of the scanned resource (html, Javascript, Css, Images, Pdf, Flash, other).
Content: at the specific “URL” examined returns the type of the element:
- Html;
- Javascript;
- Css;
- Images (Images);
- Pdf;
- Flash;
- Other.

Status Code: identifies the server’s Http response:
- code 200;
- code 301;
- code 302;
- code 404;
- code 500;

Status: identifies the Http response of the header when a specific URL is invoked:
- code 200: OK;
- code 301: Moved Permanently (permanent redirect);
- code 302: Moved Temporarily (temporary redirect);
- code 404: Not Found;
- code 500: Server Error (Server-related problem).,
- Connection Timeout

Indexability: indicates whether a URL is “Indexable” (indexable by the Search Engine) or “Non-Indexable” (not indexable).

Indexability Status: specifies the reason why a URL cannot be indexed. For example because “Canonicalised” to another resource, status code other than 200 etc, has “Connection Timeout” in the status and thus is considered “No Response” etc.

Meta Tags

Title 1: identifies the “Meta Title” (Page Title) that is used by Google in its Serp results.
Title 1 Lenght: length of “Title 1” in terms of number of characters. A title between 30 and 55 characters is recommended in order not to incur snippet truncation in Serp.
Title 1 Pixel Width: width of “Title 1 ” in terms of pixels.
It corresponds as functionality to the previous one but I advise you to always reason in pixels because the difference in the space occupied by the characters could mislead you.

Example:

If you use a letter “M” or an “i” with the first indicator you will always get 1 but the pixels used by the two letters will be different with a higher incidence of the capital “M” in terms of space in the Snippet. My advice is to always check the size in Pixels to avoid truncation in the Serp.

Meta Description 1: identifies the “Description” used in the Google snippet.
Meta Description Length: number of characters used for the “Meta Description”. Text between 70 and 155 characters is recommended.
Meta Description Pixel Width: width in pixels of the Meta Description. It is recommended to stay in the range of 400 to 1011 pixels.
Meta Keyword 1: keywords used for the individual URL. The inclusion of Meta Keywords is optional, Google has stated that it does not consider in the evaluation of content and relative ranking.
Meta Keywords Length: number of characters used for meta keywords.
H1- 1: identifies the first heading “H1” of the URL (main title of the page).
H1 Length-1: length in characters of the first heading “H1”.
H2-1: identifies the first heading “H2” (subheading).
H2-Length-1: identifies the length in characters of the first heading “H2”.

Example:

The Seo Spider collects only the first two H2 headings it encounters in the source code. To extract the data for the other headings (h3,h4 etc) I recommend you use the “Custom Extraction” function.

Meta Robots 1: identifies meta robots directives (e.g., max-image-preview:large)
X-Robots-Tag 1 – X-Robots-tag HTTP: header directives for the URL (e.g., noindex).
Meta Refresh 1: searches if there is a data “refresh” string in the code of a specific URL.
Canonical Link Element: identifies the data of the “canonical” link element.
rel=”next” 1: The Seo Spider collects these HTML link elements designed to indicate the relationship between URLs in a paginated series, specifically indicating the relationship to the next page. This element and the next are often found in blogs, forums, and e-commerce. Although Google has claimed to be able to completely scan a website without these indications I believe they should still be considered for other Bots. The “next” indicates the relationship to the next page.
rel=”prev” 1: The Seo Spider collects these HTML link elements designed to indicate the relationship between URLs in a paginated series. The “prev” indicates the relationship with the previous page.

Content

Size: this size is defined in kilobytes. The value is set by the Content-Length header if provided; if not provided, it is set to zero. For HTML pages, the value considers the size of the HTML (uncompressed) in KB. On export, the size is served in bytes Equivalence → Kilobytes = 1,024 bytes
Word Count: this index identifies the sum of all “words” scanned within the body tag, excluding HTML markup. The count is based on the content area excluding the nav and footer elements. You can also customize this metric by defining a different area of analysis. For a more granular diagnosis you can include or exclude HTML elements, classes and IDs. Keep in mind that although the Seo Spider is very accurate there may be discrepancies between the reported values compared to a manual calculation. These inconsistencies are due to the fact that the parser performs some corrections when it comes across invalid HTML. Also not to be underestimated are any personal rendering settings that might also affect which HTML is examined. Screaming Frog counts a word by taking the text and dividing it by spaces without any consideration based on the visibility of the content (such as text within a div set as hidden).
Closest Similarity Match: Through this feature you can see the similarity index between multiple pages and avoid duplicate situations. Using Seo Spider’s default thresholds, pages are identified as “near duplicates” if they have a 90% or higher match. You can also customize this feature by defining ad hoc thresholds.
To populate this column you will need to enable “Crawl Analysis” at the end of the crawl. Only URLs with content above the selected similarity threshold will contain data, the others will remain empty. Thus, by default, this column will only contain data for URLs with 90% or more similarity.

Config > Content > Duplicates > Enable Near Duplicates > Near Duplicate Similarity Threshold

Hash: the Seo Spider considers the “hash” value through MD5 algorithm of the URL. In the case of two identical values, the content is considered duplicate and may be penalized by the Search Engine. This index, unlike the “Closest Similarity Match” tab, is a check for exact duplicate content.
Please note: If two hash values match, the pages are exactly the same in content. If there is the difference of even one character, they will have unique hash values and will not be detected as duplicate content.

URL > Duplicate

No. Near Duplicates: identifies the number of near-duplicate URLs discovered during the scan that meet or exceed the threshold set in the ‘Near Duplicate Similarity Threshold’ (default is 90%). This setting can also be changed.

Config > Content > Duplicates

To populate this column you must have previously selected the “Enable near-duplicates” option and started the “Crawl Analysis.”

Spelling Errors: the total number of spelling errors discovered for a URL. For this column to be populated, ‘Enable Spell Check’ must be selected.

Config > Content > Spelling & Grammar

Grammar Errors: the total number of grammar errors discovered for each individual URL. For this column to be populated, “Enable grammar check” must be selected. In the settings you can define the control language, define the grammar rules and the reference dictionary.

To consult the data on grammatical and “Spelling” errors, simply consult the dedicated columns in the “Internal” tab select the URLs with errors and see them in detail in the “Spelling & Grammar Details” tab of the bottom window of the Seo Spider. Excellent functionality for the copywriter in unearthing any typos or misuse of grammar rules. In addition to the list of errors, there are suggestions for correction and the section where they were found with a preview . For the Italian language, it is still unreliable in suggestions but very useful in terms of typos.

Config > Content > Spelling and grammar

Http & More

Response Time: time taken in seconds to download the URL. It is calculated based on the time it takes to issue an HTTP request and get the full HTTP response from the server. This parameter does not include the time to download additional resources with “JavaScript rendering mode”.
Last Modified: index read from the “Last-Modified” header in the HTTP response from the server. If the server does not provide it, the value will be empty.
Redirect Url: If the URL has a redirect this column will include the destination of the URL.
Redirect Type: the Spider Seo identifies:
- HTTP Redirect: triggered by an HTTP header.
- HSTS Policy: redirection from HSTS header.
- JavaScript Redirect: redirection triggered by JavaScript execution (may only be present when using “JavaScript rendering” mode).
- MetaRefresh Redirect: triggered by a meta refresh tag in the HTML.
HTTP Version: this column shows the HTTP version with which the crawl was performed. The Seo Spider currently crawls only using HTTP/1.1, but this column was released in preparation for HTTP/2 support in a future update.