What is Web Scraping?

Learn what web scraping is and how to use Screaming Frog to retrieve information from the serp.

Web Scraping

The concept of web scraping has its roots in the English “to scrape” and is a particular methodology aimed at collecting data and information directly from web pages or the serp to catalog and store them in a database.

In some cases, still very frequent, web scraping is used to collect names, surnames and sensitive data to be used as databases for marketing but this activity is to be considered illegal.

In other cases, this activity may be used for phishing campaigns, identity theft or copyright violations.

In spite of these excesses in its use, Web Scraping is not considered illegal and allows for the retrieval of very important data in SEO analysis and for all Digital Marketing in general.

Digital Marketing and Scraping

After years in which this activity has been dependent on medium-to-high-level computer knowledge, there are now no-code tools that enable remarkable results in terms of data mining.

At the heart of Web Scraping is a standard called XPath, which makes it possible to locate and, consequently manage the different nodes of an Html document quite easily.

Web Scraping and SEO

Web scraping offers several benefits for Search Engine Optimization (SEO):

  • Keyword research: Web scraping enables the collection of a wide variety of data, including data of keywords used by competitors or found on the web pages of websites most relevant to a particular industry. This information can be used to identify new keyword opportunities and improve SEO strategy (heading analysis, meta tags etc.).
  • Competitor monitoring: through web scraping, it is possible to constantly monitor the activities of competitors, including changes in their SEO strategies, new content published, Google algorithmic updates, and more. This information can be valuable in adapting and optimizing one’s SEO strategy.
  • Updating data: conscientious use of this activity keeps the data used for search engine optimization up to date. For example, you can constantly monitor search results and ranking updates for specific keywords, as well as collect data on new search trends and user behaviors.
  • Backlink analysis: Web scraping can be used to extract information about backlinks from relevant or competing websites. This data can be analyzed to identify new link building opportunities and assess the quality of existing backlinks (services such as Semrush/Seozoom).
  • Content Scraping: When used in an ethical and compliant manner, web scraping can enable you to gather relevant content from other online sources to enrich your website with additional information or to create new, high-quality content by integrating different sources.
    In summary, web scraping offers SEO specialists a set of useful tools and data to optimize and improve the performance of websites in search engines. However, it is important to use this information in a responsible and compliant manner to avoid legal disputes or penalties from search engines.

Screaming Frog and Web Scraping

Understanding the importance of web scraping, let’s see how to leverage Screaming Frog and extract valuable information to improve our digital marketing strategy.

With the SEO Spider you will be able to hone your ability to collect data from web pages quickly and efficiently for analysis of your website or process even very advanced comparative competitor analysis.

ThroughXPath and Custom Extraction your analyses will never be the same again. What’s more, since version 19 of Seo Spider, the “Custom Scraping” function has become even simpler and more intuitive, and no prior knowledge will be needed to achieve the desired results.

Once in the function, it will be sufficient to click on “add” in the lower right corner and enter the scraping expression.

If you are already familiar with Xpath expressions, you can enter them directly into the “Enter Xpath” cell, or you can opt for the “Visual” version of the Seo Spider, which allows you, in just a few clicks, to visually understand the information to be extracted using Screaming Frog’s internal browser.

Next, by selecting the element on the page you wish to extract the SEO Spider will then highlight the area on the page and create a variety of suggested expressions (you will find them to the right) and a preview of what will be extracted based on the Raw or rendered HTML.

In the example above, I selected product prices (clicked on prices) and, as you see on the right, the SEO Spider entered the correct syntax for extraction.

There are essentially 4 draws available (“date”):

  1. Extract HTML Element: the selected element and all its internal HTML content.
  2. Extract Internal HTML: the internal HTML content of the selected element. If the selected element contains other HTML elements, they will be included.
  3. Extract Text: the textual content of the selected element and the textual content of any child elements.
  4. Function Value: the result of the provided function, e.g. count(//h1) to find the number of h1 tags on a page.

Once you have processed the crawl the details of the Custom Extraction you can find it in the dedicated tab; in case there is more than one extraction you can use the filters to switch between the different extractions.

Web Scraping the SERP

In addition to web scraping internally or to specific competitor sites, it is also possible to do extractions on directories and on the Search Engine itself. This activity, however to be done with some care so as not to be banned the IP, is very useful, for example, in collecting ranking data with respect to certain strategic keywords.

Screaming Frog does not replace services like Semrush or Seozoom but for keywords to rank can be a good tool to check competitors’ positioning, characteristics of the most successful sites etc.

  • The first step will be to create a “Google Search query” Url to be scanned by the Seo Spider. This step can be processed simply by using the following Google Spreadsheet form.
  • The second step will be to configure the SEO Spider appropriately:
    • Use JS rendering.
    • Set to “Ignore Robots.txt.”
    • Use the user agent “Chrome.”
    • Decrease the scan speed (Max Threads = 1 | Max URI/s = 0.5)

The advice is to save this configuration as a new “profile” for use whenever needed.

How to create a custom profile with Screaming Frog.

Screaming Frog SetUP

With the technical configuration phase of Screaming Frog completed, it is time to define the syntax for extracting custom data from the serp through a scan in “List” mode.

By default, processing a crawler with the URLs generated with List mode, you will get some interesting results already by looking at the lower “Outlinks” tab in which you will find the URLs placed for that specific query.
As you will see there will be considerable references to Google links, so the advice is to use the filter in the bottom tab and enter the syntax “TO” does not contain Google ([To] Not Contains ‘Google”).

In addition to this basic function we can process more advanced seo audits of serp using Custom Extraction for example of the “Also people Ask”, featured snippets, presence of videos or other defining Serp type. You will be able to find the XPath syntaxes using the internal Seo Spider Browser.

Web Scraping On Field

The goal of this test will be to extract data related to “People also Ask” from the serp. Our reference query will be,“What is Seo?”

  • We enter this Url generated with our “Url generator: https://www.google.it/search?q=cose+la+seo.
  • We open Config > Custom > Custom Extraction and use the internal browser.
  • We start the scan (list mode) and consult the data in the Custom Extraction tab.

In the same way it will be possible in a very simplified way to extract all other Google features and get an advanced granular and overview for our Seo Audits!

REMEMBER: The form you used previously includes some XPath syntaxes already configured but since the Search Engine is constantly updating they may not return reliable results, my advice is to always use the internal Browser for these analyses!

Seo Spider Tab