Embeddings, Seo & Screaming Frog

Learn what embeddings are, what their applications are, and how to leverage them in Seo analysis and through Screaming Frog.

What are Embeddings?

Embeddings are rapidly becoming a key element in the field of artificial intelligence (AI) and machine learning (ML). These powerful tools allow complex data to be represented in more manageable formats, significantly improving the effectiveness of machine learning algorithms.

Embeddings are vector representations of complex objects such as words, images or users. They transform nonnumeric data into a numeric form that can be easily processed by ML (Machine Learning) algorithms.

Transforming this data into vector representations will enable you, effectively, to discover relationships and patterns that are very useful for the strategic management of complex analyses.

The size of a vector will be determined by the embedding model used and will determine its ability to represent the meanings of a word or document with a set of numbers that are most illustrative for Bots.
The vector size represents the depth of analysis encoded within the embedding space.

But what data can be derived from these vectors? For example, in theemdedding of words, semantic, syntactic, and contextual aspects can be derived; the larger the size of the vector, the more detailed the information. Through these concepts we can understand how analysis moves in Seo from the aseptic study of “keywords” to the discovery of“Search intent” not according to our personal assessment but based on in-depth analysis increasingly similar to that fielded by Search Engine Spiders.

Advantages of Embeddings

  • Dimensionality Reduction: embeddings reduce the complexity of the original data while maintaining relevant information. This allows for more efficient data manipulation and analysis, and in Seo they represent real gold mines for analysis increasingly akin to Search Engine spider behavior.
  • Improved Algorithm Performance: due to their ability to capture semantic relationships, embeddings improve the accuracy and precision of ML algorithms. This scenario is particularly evident in natural language processing (NLP) and image recognition applications.
  • Flexibility and Applicability: embeddings can be applied to different types of data and in various domains, from product recommendations to sentiment analysis, making them versatile tools for multiple applications. For example, by assigning numerical sets to words in terms of semantic similarity, “word embedding” allows neural network models to understand context in an absolutely more meaningful way and then process it.

Embeddings applications in Seo

  • Natural Language Processing (NLP): in NLP, word embeddings such as Word2Vec and GloVe enable models to better understand the context and meaning of words in texts. This has revolutionized applications such as machine translation, sentiment analysis and intelligent chatbots. Word embeddings can be useful for keyword analysis and targeting because they allow us to understand the meanings and relationships between words, as well as identify patterns and trends in text data by helping us identify the most important or relevant queries in a given text. This is useful for SEO purposes, as it allows companies to identify the keywords most used by users when they search for a particular product or service. By understanding the relationships between words, word embeddings can also help identify synonyms and related terms that can be useful for keyword targeting and to identify potential search terms that users might use when searching for the company’s products.
  • Product Recommendations: a very useful application is related to e-commerce that by analyzing embeddings can timely and strategically present related products based on product similarities and user preferences. This approach improves the customer experience and increases sales and is not limited to proposing “complementary” or same-category products as is commonly done automatically using popular CMSs. Through embeddings, it will then be possible to develop increasingly accurate and high-performance strategies that seek the balance point between supply and demand.
  • Image Recognition: embeddings are also used in image recognition, where they help identify and classify objects within images. Techniques such as convolutional neural networks (CNNs) generate embeddings that represent visual features of images, improving recognition performance.
  • SEO and Digital Marketing: embeddings are also used to improve SEO and digital marketing strategies, enabling better understanding of user behavior and more effective content creation.

Let us look more specifically at the Seo applications of Embeddings:

  • Clustering: in SEO, “clusters” refer to groups of related content revolving around a main topic. Each cluster includes a central “pillar page,” which broadly covers the topic, and more specific supporting pages that delve into related subtopics. This approach helps to better organize site content, improving navigability and user experience, as well as facilitating better indexing by search engines. Content clusters also make it possible to increase the authority of the site on specific topics, thereby improving its ranking in organic searches.
  • Classification: assigning categories to words or documents based on trained patterns.
  • Recommendations: suggestion of relevant articles to users.
  • Measurement of Similarity and Diversity: assessment of similarities and differences between words and documents. This scenario will allow you to avoid duplicate content and handle potential “cannibalization” query situations by proposing very specific analyses that manage to intercept even the smallest details and characteristics of the scanned pages.
  • Detection of Anomalies: identification of items that deviate significantly from the data that will enable optimization of the internal architecture or to modify the content.
  • Collection of Profiled Information: obtaining relevant information from large datasets.
  • Language Translation, Sentiment Analysis: one way word embeddings help language translation is by providing a common representation of words in different languages. To translate a word from one language to another, a machine learning model must understand the meaning of the word in both languages. Word embeddings can provide this understanding by representing words in a language-independent way. This means that a word embedding for a word in one language can be compared with a word embedding for the same word in another language, allowing the machine learning model to understand the meaning of the word in both languages. So this approach will not be limited to classic lexical and syntactic translation but will propose contextualized content in the different language versions that will allow us to improve our ranking on foreign Serps.
  • Sentiment Analysis: to create a lexicon of feelings or emotions using word embeddings, one might begin by selecting a set of words or phrases that are known to be associated with particular feelings or emotions. These could be words or phrases that have been noted by humans for feeling or emotion, or words or phrases commonly used in social media or other online platforms to express particular feelings or emotions.Once a set of words or phrases has been selected, the next step is to use word embeddings to create vectors for each of these words or phrases. These vectors can be created using a pre-trained word embedding model, or they can be created from scratch using machine learning algorithms and a large text data set. Once vectors are created for words or phrases in the lexicon, they can be used to identify the feeling or emotion associated with other words or phrases. For example, if a word or phrase turns out to be similar to a vocabulary vector associated with a particular feeling or emotion, it can be inferred that the word or phrase is also associated with that feeling or emotion (source: https://marketbrew.ai/word-embeddings-a-comprehensive-guide).
  • Content Generation: information from embeddings can be used to generate new sentences related to the original text. For example, assuming we have text on the Outdoors with Word Embedding we will be able to identify relevant words and phrases on the topic such as “camping,” “outdoor life” that will allow us to improve the content of the page with a view to making it more relevant to the “Search Intent” of the Outdoors. In addition to the generation of complete content, through word embeddings we will be able to calculate the semantic similarity of a “query” with respect to others in a vocabulary(cosine similarity) to make our content more usable without the obstacles of copy related to its mere Seo function. A high cosine similarity will indicate that the vectors are similar, while a low cosine similarity will indicate that they are different and cannot be considered synonymous at the semantic level for the Spider .
  • Contextual Word Prediction: through embeddings we will be able to improve the accuracy of autocomplete or predictive text function. The embeddings capture the context in which the words appear and then will be able to use them to predict the next word based on the words that precede it allowing automatic completion of internal search. This activity, together with the always valuable analysis via Google Analytics 4 of the queries used in one’s website will allow us to improve the UX for example of our e-commerce and serve the best possible proposition to the potential buyer.

In previous guides we have focused on the importance of N-Gram analysis, but with embeddings you will be able to do a remarkable update of information.

In terms of SEO effectiveness, word embeddings have several advantages; in the first instance, they allow a less approximate understanding of the meaning and context of words within a document, which can be useful in determining the relevance and quality of a web page, compared to treating the analysis as a frequency of individual entities that remain valid for internal architecture but not as specific as this scenario we are describing. Likewise, through N-Grams it is not possible to identify related terms and thus the power of analysis is unequivocally reduced.

Screaming Frog, embedding and ChatGpt API.

With version 20 of Screaming Frog SEO Spider, “embeddings” were introduced in the Seo Spider generated using an OpenAI template and through the API directly with an internal function of the tool.

To implement this feature, Screaming Frog has integrated the ability to run custom JavaScript functions that will be available via the Custom > Custom JavaScript section , where you can choose the dedicated script and enter the OpenAI API key to start generating embeddings during scanning.

Utilizzo della funzione Custom Javascript attraverso uso dello Spider Seo Screaming Frog

Once you choose Custom Javascript you will simply click on“+Add from Library” and choose Screaming Frog’s first suggestion “(ChatGPT) Extract embeddings from page..”

Analisi Embeddings con Screaming Frog Seo Spider

Click on the “JS” icon to edit the script and insert the API generated with OpenAI.

Screaming Frog - modifica JS custom

Below is the screen where you can enter your API Key. Once “copied and pasted” into the reserved space, it will be sufficient to process the scan with Screaming Frog to get the embeddings. Since there is a cost to using the API, the advice is to use the “Javascript Tester” by entering a URL to be scanned to preview the possible result you will get with a full scan.

Javascript snippet for extraction embedding

The results of the “Custom Javascript” can be read in the dedicated tab by choosing from the filters in case several extractions or actions requested through the “Custom JS” are active.

Tab Custom Javascript su Screaming Frog per analizzare gli embeddings

In case there is a need to share the Screaming Frog scan or configuration, it is necessary to delete the API to not provide this data. From community rumors, the developers of Screaming Frog have already been asked to obscure the API once entered to protect this valuable and expensive data.

But now that we have obtained the embeddings from Screaming Frog what can we process our new data? As we have seen there are numerous uses of embeddings, let us see together the process to find out based on the obtained vectors a very accurate analysis of related pages.

Related Page Analysis

A first example of application would be to understand the interconnections between pages of a website or between the detail pages of an e-commerce to best provide an optimal experience for the visitor or potential “buyer.”

Let’s see how to do it:

  1. I get the embeddings with Screaming Frog.
  2. I export the Seo Spider data and format the resulting file by replacing “Address” with “URL.”
  3. I delete the “Status Code” and “Status” columns. In this case you will have only 2 headers:
    1. URL.
    2. Embeddings.
Esempio Embeddings con Screaming Frog e formattazione corretta

At this point you can use a Python script on Coleb to process your data. The following is a ” Gus Pelogia script for processing.

The tool will ask you to upload your CSV file (the one from Screaming Frog).

Once you have processed the data, it will automatically download another CSV with the results categorized into two columns (if you use Google Spreadsheet you will have to tabulate the data into columns):

  • Source of the page.
  • Related pages.
Elaborazione dello script di screaming frog con Colab e Python

Just click on the “Play” icon and Colab will allow you to upload the file obtained from Screaming Frog.

Upload file su Colab per elaborazione embeddings

At the end of processing, a new file named“related_pages.csv” will be automatically downloaded locally.

Pagine correlate ottenute da embeddings Screaming Frog

Now all you have to do is check the results and apply the correct changes to the site for optimal “correlation” of content at the semantic level. If applied to products in an e-commerce business, it could be an excellent strategy for improving ROI and average shopping cart.

Clustering

For clustering through embeddings you could use different clustering algorithms such as K-means or DBScan that use different similarity metrics to create homogeneous groups. To create a script I used the libraries “pandas” for data manipulation, “sklearn” for clustering, and “matplotlib + seaborn” for displaying an image and document output in csv. The file I used as the dataset was formatted like the previous one with two columns “URLs” and “Embeddings.”

Copy to Clipboard
Seo Spider Tab