TAB HREFLANG

Analysis of the Hreflang attribute for project internationalisation management.

Overview
Hreflang

In an increasingly global market, there is a growing need to internationalize one’s business with websites that feature translated content that meets the diverse needs of the audience. Google uses the “rel=”alternate” hreflang=”x” attributes to define what content is most appropriate to serve in search results based on the visitor’s geolocation.

For this reason, hreflang plays a key role in international organic positioning by ensuring that the Search Engine better understands the relationship between pages with different language versions and to avoid at the Seo level to reduce potential content considered duplicate.

The hreflang attribute can be inserted net tag <head>… </head>, in the HTTP header or in the xml sitemap and provides in its syntax the indication of the page in which it is inserted (self-referencing) and all the alternative languages in which the content is served.

It is also possible to define for the same language the geographical area of an alternative URL:

In the case just discussed, the page provides a version for British English, American English, a German version, and an English version that is served by Google in case the visitor cannot be satisfied linguistically by the geo-localized versions (E.g. it could be the case of an Australian, or an Italian visiting the website).

To collect data on the internationalization of a website you can rely on Screaming Frog and specifically on the dedicated tab that includes details on hreflang annotations scanned by the SEO Spider, provided by HTML link elements, HTTP Header or XML Sitemap.

To collect this valuable data, you must enable the “Store Hreflang” and “Crawl Hreflang” options (Config > Spider). To extract hreflang annotations from the XML Sitemap, you must also enable the “Crawl Linked XML Sitemaps” option.

Hreflang and Filters

In addition to the URL address and the title of the page in the tab you will find important information related to:

  • Occurrences: the number of “rel=alternate” discovered during the crawl.
  • HTML hreflang 1-2: the language of the hreflang attribute and the region code of the HTML link.
    HTML hreflang 1/2 URL etc: the URL used as the Hreflang.
  • HTTP hreflang 1/2 etc: the language of the hreflang and the region code of the HTTP header.
  • HTTP hreflang 1/2 URL etc: the hreflang URL from the HTTP header.
  • Sitemap hreflang 1/2 etc: the language in the hreflang and the region code discovered from the XML Sitemap. Please note, this data is only available to you if you crawl the XML Sitemap in list mode.
  • Sitemap hreflang 1/2 URL etc: the hreflang URL of the XML Sitemap. Please note: This data is entered only when scanning the XML Sitemap in list mode.

The Tab provides the following filters:

  • Contains Hreflang: This filter considers all URLs that have hreflang rel=”alternate” annotations from any implementation, whether link elements, HTTP header, or XML Sitemap.
  • Non-200 Hreflang URLs: displays all urls that have been used in hreflang rel=”alternate” annotations but do not respond with “Status Code” 200. For example, URLs blocked by robots.txt, no response, 3XX (redirect), 4XX (client errors) or 5XX (server errors) are included. As you can understand, given the importance of this element, it is vital that hreflang URLs be scannable and indexable, and therefore URLs that do not allow this must be optimized so that they are not ignored by Search Engines. Non-200 hreflang URLs can also be examined in the lower pane of the “URL Info” window by checking the “Status Code.”
    It is possible to export them in bulk via:

Report > Hreflang > Non-200 Hreflang URLs

  • Unlinked Hreflang URLs: the filter identifies pages that contain one or more hreflang URLs that are discoverable only through the “rel=alternate hreflang” annotations of hreflang rel=”alternate” links. Considering that hreflang annotations do not pass PageRank like a traditional anchor tag, this condition could be an important warning from an internal link optimization perspective, or a typo or strategy of hreflang annotation.
    To find out exactly which hreflang URLs on these pages are unlinked, you can use the “Unlinked Hreflang URLs” report export .

Report > Hreflang > “Unlinked Hreflang URLs”

  • Missing Return Links: includes all URLs discovered in “hreflang” annotations that are missing a return link (or ‘return tags’ in Google Search Console) to them, from their alternate pages. Hreflang is reciprocal, so all alternative versions must confirm the relationship. When page A links to page B using the hreflang annotation to specify it as an alternate page, page B must provide a return link.
    Lacking a return link, the Search Engine may ignore the annotation completely or misinterpret it. Missing return URLs can be seen in the “URL Info” lower window tab with a “missing” confirmation status, and you can export them in bulk via the “Missing Return Links” export.

Reports > Hreflang > Missing return Links

  • Inconsistent Language & Region Return Links: this filter includes URLs with inconsistent language and region return links. This occurs when a return link has a different language or regional value than the URL itself refers to. Inconsistent language return URLs can be seen in the lower “URL Info” window with an “Inconsistent” confirmation status.

Reports > Hreflang > Inconsistent Language Return Links

  • Non-Canonical Return Links: URLs with noncanonical hreflang return links. Hreflang should include only canonical versions of URLs. So this filter collects return links that go to URLs that are not the canonical versions (canonicalised).
    This filter is more concerned with return links from hreflang annotations. So imagine that URL A is canonicalized to URL B and an hreflang annotation to URL C, on URL C the return hreflang should be to the canonical URL (URL B) rather than URL A.
    Non-canonical return URLs can be seen in the lower “URL Details” window with a confirmation status of “Not canonical” and can be exported in bulk via the “Non Canonical Return Links” export.

Reports > Hreflang > Non Canonical Return Links

  • Noindex Return Links: includes links placed in the “rel=alternate hreflang” that have a “noindex” meta tag. Noindex return link URLs can be parsed in the lower pane of the “URL Info” window with a “noindex” confirmation status. They can be exported in bulk via the “Noindex Return Links” export.

Report > Hreflang > Noindex Return Links

  • Incorrect Language & Region Codes: simply check that the language (in ISO 639-1 format) and optional region code values (in ISO 3166-1 Alpha format 2) are valid. Unsupported hreflang values may be displayed in the lower “URL Info” window with an “invalid” status.
  • Multiple Entries: filter identifies URLs with multiple entries for a language or region code. For example, if page X links to page Y and Z using the same hreflang value annotation “en”.
  • Missing Self Reference: collects URLs missing self reference in rel=”alternate”.
  • Not Using Canonical: filter that highlights the condition whereby URL A is canonicalized to URL B, but the hreflang annotation continues to self-reference URL A. Hreflang should include only canonicalized versions of URLs.
  • Missing X-Default: collects URLs missing an X-Default hreflang attribute.
  • Missing: set of URLs that are missing hreflang attributes.

Hreflang Video Analysis

Seo Spider Tab