How to Debug Invalid Html in Head?

Find out how to debug Invalid HTML in Head

Invalid HTML

This tutorial explains how to use Screaming Frog SEO Spider to identify invalid HTML elements in the header, visualize what metadata might be adversely affected, and how to fix the problem that is causing the error.

Using valid HTML for a page’s metadata ensures that Search Engines, such as Google, are able to use it to index individual pages.
Although Search Engines will try to understand the markup even when there are errors, some elements within the header can cause criticality in Spider’s handling of the information.
If invalid HTML elements are used within the header (“<head>”), Google will consider that the header should be closed and will start the body in its analysis.

This means that any metadata that appears after the invalid HTML element could be ignored by Google.

Valid HTML Elements

The HEAD should only contain the following Html elements:

  • title
  • meta
  • link
  • script
  • style
  • base
  • noscript
  • template

Other elements may cause critical scanning and indexing issues:

  • iframe
  • img
  • svg
  • div
  • noscript that contains an “img”.

In general, it should also be considered that if <body> elements directly precede the <head> element, it could affect all metadata in the header.

For example, a tag <div> out of place that precedes the opening of the element <html> Will cause Google to automatically open and close an item <head> empty, which means that all metadata will be in the <body> and could potentially be ignored.

According to Google’s directions if, despite these directions, there is a need to insert “invalid” elements in the Head, they should be inserted after the elements to be considered by the Spider.

Identify “Invalid HTML”

Finding invalid HTML elements in the header, or prior to it, on a large scale in a website is difficult, and this is where Screaming Frog can help do the bulk of the work.

The Seo Spider will flag any pages with invalid <html> elements that might be problematic and any meta tags such as titles, canonicals or meta robots that are outside the header.

Website Crawl

The first step is very simple and you only need to enter the domain or subdomain to be analyzed without having to configure anything in the Seo spider except the “Rendering JS” crawling mode to analyze any elements entered after JS processing.

Analysis Sheet “Validation”

During the scan you can already see the data in “Real time” or wait until the end of the scan for a complete picture of the URLs that present this scenario.
The reference Tab is called“Validation” and includes several filters:

  • Invalid HTML Elements in Head: pages with invalid HTML elements within the header. When an invalid element in the header is used, Google assumes the end of the element <head> and ignores any elements that appear after the invalid element. This means that critical header elements that appear after the invalid element will not be seen.
  • “<body>” Element Preceding “<html>“: Urls that have a body element that precedes the opening of the html element. Browsers and Googlebots automatically assume the beginning of the body and generate an empty head element before it. This means that the expected head element and its metadata will be seen in the body and ignored.
  • “<head>” Not first in “<html>” Element: the element <head> should be the first element in the code <html>. Browsers and Googlebot will automatically generate a <head element> if it does not find it as the first element.
  • Missing “<head>” Tag: pages that are missing a <head> element in the HTML.
  • Multiple “<head>” Tags: pages with multiple elements <head> in HTML.
  • Missing “<body>” Tag.
  • Multiple “<body>” Tags.
  • HTML Document Over 15MB.

As we have just described granular analysis you will be able to find it in the dedicated Tab, while for a macro view you will be able to monitor the data in the sidebar that will be labeled High Priority in the “Issue” tab.

Debug <head>

Once all URLs affected by this criticality are discovered, it will be possible to analyze them thoroughly in three ways:

  • Raw HTML: Head analysis in raw HTML through right-click and choosing the “View Source” option of the page. In this case, the analysis will be partial because there may be elements that are inserted with javascript processing.
  • Rendered HTML: analysis of the page after JS rendering.
    In this case it will be sufficient to use Chrome’s “Inspect Element” and consult the “Elements” tab.
  • Search Console: You can also check for invalid items in the Head by checking urls through the “URL Inspector” function of Search Console.

To confirm how important the topic just discussed is for SEO, let’s look at an example below.

The rendered HTML on the right-hand side looks correct, but the “Canonical declared by user” is ignored because of the <noscript tag> that contains an image above it, closing the header.

Seo Spider Tab