{"id":3507,"date":"2024-03-25T07:58:42","date_gmt":"2024-03-25T07:58:42","guid":{"rendered":"http:\/\/screamingfrog.club\/web-scraping\/"},"modified":"2024-07-08T06:47:42","modified_gmt":"2024-07-08T06:47:42","slug":"web-scraping","status":"publish","type":"post","link":"https:\/\/screamingfrog.club\/en\/web-scraping\/","title":{"rendered":"Web Scraping"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-title-center fusion-title-text fusion-title-size-two\" style=\"--awb-text-color:#141617;--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-sep-color:var(--awb-color6);--awb-font-size:40px;\"><div class=\"title-sep-container title-sep-container-left\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><span class=\"awb-title-spacer\"><\/span><h2 class=\"fusion-title-heading title-heading-center awb-gradient-text fusion-responsive-typography-calculated\" style=\"font-family:&quot;Work Sans&quot;;font-style:normal;font-weight:800;margin:0;letter-spacing:-0.012em;text-transform:var(--awb-typography1-text-transform);background-color:#f79501;background-image:linear-gradient(150deg, #f79501 20%,#fe6996 100%);font-size:1em;--fontSize:40;line-height:1;\">Web Scraping &amp; Custom Extraction<\/h2><span class=\"awb-title-spacer\"><\/span><div class=\"title-sep-container title-sep-container-right\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><\/div><div class=\"fusion-text fusion-text-1\"><p>Let&#8217;s see how you can use Screaming Frog to do Web Scraping using the Custom Extraction (Advanced Search) feature.<\/p>\n<p>Through this feature you are able to retrieve any HTML data from a web page using CSSPath, XPath, and RegEX.<\/p>\n<p>Extraction is performed on the static HTML of URLs scanned by the SEO Spider that respond with a status code 200 &#8216;OK&#8217;.<\/p>\n<p>If you want to do extractions from rendered data, it is possible to enable the &#8220;Javascript Rendered&#8221; mode.<\/p>\n<ul>\n<li>1. Configuration of Custom Extraction<\/li>\n<\/ul>\n<p>To set up your custom search go to Configuration &gt; Custom &gt; Extraction.<\/p>\n<p>Through this function you are able to set up to 100 custom data extraction requests.<\/p>\n<div class=\"wp-block-image\"><\/div>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><a href=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider.png\" class=\"fusion-lightbox\" data-rel=\"iLightbox[24ba755461b5f67d74b]\" data-title=\"custom extraction del seo spider\" title=\"custom extraction del seo spider\"><img decoding=\"async\" width=\"1302\" height=\"817\" src=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider.png\" alt class=\"img-responsive wp-image-3086\" srcset=\"https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider-200x125.png 200w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider-400x251.png 400w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider-600x376.png 600w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider-800x502.png 800w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider-1200x753.png 1200w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-del-seo-spider.png 1302w\" sizes=\"(max-width: 640px) 100vw, 1200px\" \/><\/a><\/span><\/div><div class=\"fusion-title title fusion-title-2 fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:#141617;--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-sep-color:var(--awb-color6);--awb-font-size:40px;\"><div class=\"title-sep-container title-sep-container-left\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><span class=\"awb-title-spacer\"><\/span><h3 class=\"fusion-title-heading title-heading-center awb-gradient-text fusion-responsive-typography-calculated\" style=\"font-family:&quot;Work Sans&quot;;font-style:normal;font-weight:800;margin:0;letter-spacing:-0.012em;text-transform:var(--awb-typography1-text-transform);background-color:#f79501;background-image:linear-gradient(150deg, #f79501 20%,#fe6996 100%);font-size:1em;--fontSize:40;line-height:1;\">Css, XPath and Regex Instructions<\/h3><span class=\"awb-title-spacer\"><\/span><div class=\"title-sep-container title-sep-container-right\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><\/div><div class=\"fusion-text fusion-text-2\"><ul>\n<li>2. Select the CSS, XPath or Regex path to be used for scraping<\/li>\n<\/ul>\n<p>The Seo Spider offers three opportunities for scraping data in websites:<\/p>\n<ul>\n<li><strong>XPath<\/strong>: through XPath you are able to select nodes from a document where to perform a query using XPath selectors, including attributes.<\/li>\n<li><strong>CSS Path<\/strong>: this option is the fastest of the mentioned methods and allows scraping using CSS Path selectors.<\/li>\n<li><strong>Regex<\/strong>: This data query uses RegEx regular expressions and is recommended for advanced uses such as scraping HTML or JavaScript comments inline.<\/li>\n<\/ul>\n<p>Opting for XPath or CSS Path to query the HTML, you can choose from several Seo Spider filters:<\/p>\n<ul>\n<li><strong>Extract HTML Elements<\/strong>: collects the information of the selected element and all its internal HTML content.<\/li>\n<li><strong>Extract Inner HTML<\/strong>: collects the inner HTML content of the selected element. If, for example, the selected element contains other HTML elements, these will also be included.<\/li>\n<li><strong>Extract<\/strong> Text: collects the textual content of the selected element and its sub-element.<\/li>\n<li><strong>Function Value<\/strong>: returns the total number of the requested element, e.g. if you are looking for how many h3 are on a page you can use &#8220;count(\/\/h3)&#8221;.<\/li>\n<\/ul>\n<\/div><div class=\"fusion-title title fusion-title-3 fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:#141617;--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-sep-color:var(--awb-color6);--awb-font-size:40px;\"><div class=\"title-sep-container title-sep-container-left\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><span class=\"awb-title-spacer\"><\/span><h3 class=\"fusion-title-heading title-heading-center awb-gradient-text fusion-responsive-typography-calculated\" style=\"font-family:&quot;Work Sans&quot;;font-style:normal;font-weight:800;margin:0;letter-spacing:-0.012em;text-transform:var(--awb-typography1-text-transform);background-color:#f79501;background-image:linear-gradient(150deg, #f79501 20%,#fe6996 100%);font-size:1em;--fontSize:40;line-height:1;\">Syntax entry<\/h3><span class=\"awb-title-spacer\"><\/span><div class=\"title-sep-container title-sep-container-right\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><\/div><div class=\"fusion-text fusion-text-3\"><ul>\n<li>3. Enter your syntax<\/li>\n<\/ul>\n<p>Once you have chosen the scraping mode, all that remains is to define the extraction syntax. To find the relevant CSS or Xpath you can simply open the web page in Chrome and &#8216;inspect the desired element, then right-click and copy the relevant selection path provided.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-sizes-left:5px;--awb-border-color:#f79501;--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-bottom:0px;--awb-background-color:var(--awb-color6);--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-left:20px;\"><p><span class=\"td_text_highlight_marker\" style=\"color: #ffffff;\">Example:<br \/>\nLet us examine the Screaming Frog blog.<\/span><\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-margin-top:20px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"850\" height=\"465\" title=\"copy-css-path-custom-extraction\" src=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction.jpeg\" alt class=\"img-responsive wp-image-3087\" srcset=\"https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction-200x109.jpeg 200w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction-400x219.jpeg 400w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction-600x328.jpeg 600w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction-800x438.jpeg 800w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/copy-css-path-custom-extraction.jpeg 850w\" sizes=\"(max-width: 640px) 100vw, 850px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-5\" style=\"--awb-margin-top:10px;\"><p>Open any blog post in Chrome, right-click and &#8220;inspect item&#8221; on the author&#8217;s name.<\/p>\n<p>Right-click on the relevant HTML line (with the author&#8217;s name), copy the relevant CSS or XPath path and paste it into the respective Seo Spider field.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"850\" height=\"116\" title=\"custom-extraction-css-path-author-comment\" src=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment.jpeg\" alt class=\"img-responsive wp-image-3088\" srcset=\"https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment-200x27.jpeg 200w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment-400x55.jpeg 400w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment-600x82.jpeg 600w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment-800x109.jpeg 800w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-css-path-author-comment.jpeg 850w\" sizes=\"(max-width: 640px) 100vw, 850px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-6\" style=\"--awb-margin-top:10px;\"><p>If the syntax entered is valid (.author-details-social&gt;a) you will see a green check mark next to your input, otherwise there will be a warning with a red cross identifying that the syntax is not considered correct.<\/p>\n<p>Having completed this you simply click the &#8220;ok&#8221; button and start the crawl.<\/p>\n<p>To learn more about CSS selectors and XPath I recommend you follow w3schools.<\/p>\n<p>Scan the website<\/p>\n<p>With the syntax entered and validated all you have to do is scan the website to start scraping.<\/p>\n<p>View scraping data in the &#8220;Custom Extraction&#8221; tab<\/p>\n<p>Web scraping data are available to you in real time during scanning, in the &#8216;Custom Extraction&#8217; tab, and in the &#8220;Internal&#8221; tab.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><a href=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider.jpeg\" class=\"fusion-lightbox\" data-rel=\"iLightbox[60e28eada2f56f462ae]\" data-title=\"custom-extraction-seo-spider\" title=\"custom-extraction-seo-spider\"><img decoding=\"async\" width=\"850\" height=\"366\" src=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider.jpeg\" alt class=\"img-responsive wp-image-3089\" srcset=\"https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider-200x86.jpeg 200w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider-400x172.jpeg 400w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider-600x258.jpeg 600w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider-800x344.jpeg 800w, https:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/custom-extraction-seo-spider.jpeg 850w\" sizes=\"(max-width: 640px) 100vw, 850px\" \/><\/a><\/span><\/div><div class=\"fusion-text fusion-text-7\" style=\"--awb-margin-top:10px;\"><p>In our example, a full scan of a Web site was initiated, but if you want to do scraping from a specific list of URLs you can decide to use the &#8220;List&#8221; scan mode.<\/p>\n<p>The fields of application are endless and depend on the type of analytics being performed, this feature can be very useful for example to collect Analytics or GTM ID, social meta tags, Hreflang attribute values, product prices of an ecommerce, some discounted prices etc.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container has-pattern-background has-mask-background hundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--link_color: #f79501;--awb-border-sizes-top:0;--awb-border-sizes-bottom:0;--awb-border-sizes-left:5;--awb-border-sizes-right:0;--awb-border-color:var(--awb-color6);--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-bottom:8px;--awb-margin-bottom:43px;--awb-background-color:var(--awb-color3);--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"width:104% !important;max-width:104% !important;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-8 fusion-text-no-margin\" style=\"--awb-margin-top:5px;--awb-margin-right:10px;--awb-margin-bottom:-10px;--awb-margin-left:10px;\"><p><strong>Related Tab<\/strong>: <a href=\"https:\/\/screamingfrog.club\/en\/tab-custom-search-extraction\/\">Custom Extraction<\/a><\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container has-pattern-background has-mask-background hundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-bottom:0px;--awb-margin-top:12px;--awb-margin-bottom:12px;--awb-background-color:var(--awb-color6);--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"width:104% !important;max-width:104% !important;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-4 fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:#141617;--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-sep-color:var(--awb-color6);--awb-font-size:40px;\"><div class=\"title-sep-container title-sep-container-left\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><span class=\"awb-title-spacer\"><\/span><h3 class=\"fusion-title-heading title-heading-center awb-gradient-text fusion-responsive-typography-calculated\" style=\"font-family:&quot;Work Sans&quot;;font-style:normal;font-weight:800;margin:0;letter-spacing:-0.012em;text-transform:var(--awb-typography1-text-transform);background-color:#f79501;background-image:linear-gradient(150deg, #f79501 20%,#fe6996 100%);font-size:1em;--fontSize:40;line-height:1;\">Scraping Search Intent<\/h3><span class=\"awb-title-spacer\"><\/span><div class=\"title-sep-container title-sep-container-right\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><\/div><div class=\"fusion-video fusion-youtube\" style=\"--awb-max-width:600px;--awb-max-height:360px;--awb-align-self:center;--awb-width:100%;\" itemscope itemtype=\"http:\/\/schema.org\/VideoObject\"><meta itemprop=\"duration\" content=\"PT5M3S\" \/><meta itemprop=\"name\" content=\"Scraping con Screaming Frog\" \/><meta itemprop=\"description\" content=\"Scopri come gestire Screaming Frog e trovare il corretto Search Intent migliorando il ranking delle pagine web. Tutorial in italiano.\" \/><meta itemprop=\"uploadDate\" content=\"2022-07-27\" \/><meta itemprop=\"thumbnailUrl\" content=\"https:\/\/i3.ytimg.com\/vi\/_gowe8oliCM\/hqdefault.jpg\" \/><meta itemprop=\"embedUrl\" content=\"https:\/\/www.youtube.com\/embed\/_gowe8oliCM\" \/><div class=\"video-shortcode\"><div class=\"fluid-width-video-wrapper\" style=\"padding-top:60%;\" ><iframe title=\"YouTube video player 1\" src=\"https:\/\/www.youtube.com\/embed\/_gowe8oliCM?wmode=transparent&autoplay=0&amp;rel=0\" width=\"600\" height=\"360\" allowfullscreen allow=\"autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture;\"><\/iframe><\/div><\/div><\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-6 fusion-flex-container has-pattern-background has-mask-background hundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-bottom:0px;--awb-margin-top:12px;--awb-margin-bottom:12px;--awb-background-color:var(--awb-color6);--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"width:104% !important;max-width:104% !important;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-5 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-5 fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:#141617;--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-sep-color:var(--awb-color6);--awb-font-size:40px;\"><div class=\"title-sep-container title-sep-container-left\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><span class=\"awb-title-spacer\"><\/span><h3 class=\"fusion-title-heading title-heading-center awb-gradient-text fusion-responsive-typography-calculated\" style=\"font-family:&quot;Work Sans&quot;;font-style:normal;font-weight:800;margin:0;letter-spacing:-0.012em;text-transform:var(--awb-typography1-text-transform);background-color:#f79501;background-image:linear-gradient(150deg, #f79501 20%,#fe6996 100%);font-size:1em;--fontSize:40;line-height:1;\">Scraping &#8220;People Also Ask&#8221;<\/h3><span class=\"awb-title-spacer\"><\/span><div class=\"title-sep-container title-sep-container-right\"><div class=\"title-sep sep-single sep-solid\" style=\"border-color:var(--awb-color6);\"><\/div><\/div><\/div><div class=\"fusion-video fusion-youtube\" style=\"--awb-max-width:600px;--awb-max-height:360px;--awb-align-self:center;--awb-width:100%;\" itemscope itemtype=\"http:\/\/schema.org\/VideoObject\"><meta itemprop=\"duration\" content=\"PT2M57S\" \/><meta itemprop=\"uploadDate\" content=\"2022-07-25\" \/><meta itemprop=\"thumbnailUrl\" content=\"https:\/\/i3.ytimg.com\/vi\/v_RNQmngIh4\/hqdefault.jpg\" \/><meta itemprop=\"embedUrl\" content=\"https:\/\/www.youtube.com\/embed\/v_RNQmngIh4\" \/><div class=\"video-shortcode\"><div class=\"fluid-width-video-wrapper\" style=\"padding-top:60%;\" ><iframe title=\"YouTube video player 2\" src=\"https:\/\/www.youtube.com\/embed\/v_RNQmngIh4?wmode=transparent&autoplay=0&amp;rel=0\" width=\"600\" height=\"360\" allowfullscreen allow=\"autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture;\"><\/iframe><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[38],"tags":[52,60,53],"class_list":["post-3507","post","type-post","status-publish","format-standard","hentry","category-guide-screaming-frog","tag-beginner-en","tag-guide","tag-video-en"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Web Scraping and Custom Extraction | Screaming Frog<\/title>\n<meta name=\"description\" content=\"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/screamingfrog.club\/en\/web-scraping\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Web Scraping and Custom Extraction | Screaming Frog\" \/>\n<meta property=\"og:description\" content=\"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/screamingfrog.club\/en\/web-scraping\/\" \/>\n<meta property=\"og:site_name\" content=\"Screaming Frog Club\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-25T07:58:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-08T06:47:42+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/SF_Back_dark.png\" \/>\n<meta name=\"author\" content=\"raffaele.visintin@gmail.com\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"raffaele.visintin@gmail.com\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"27 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/\"},\"author\":{\"name\":\"raffaele.visintin@gmail.com\",\"@id\":\"https:\/\/screamingfrog.club\/en\/#\/schema\/person\/cd9ee509ae86128e5e339f9e3de1bc73\"},\"headline\":\"Web Scraping\",\"datePublished\":\"2024-03-25T07:58:42+00:00\",\"dateModified\":\"2024-07-08T06:47:42+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/\"},\"wordCount\":5358,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/#organization\"},\"keywords\":[\"Beginner\",\"Guide\",\"video\"],\"articleSection\":[\"Guide Screaming Frog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/screamingfrog.club\/en\/web-scraping\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/\",\"url\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/\",\"name\":\"Web Scraping and Custom Extraction | Screaming Frog\",\"isPartOf\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/#website\"},\"datePublished\":\"2024-03-25T07:58:42+00:00\",\"dateModified\":\"2024-07-08T06:47:42+00:00\",\"description\":\"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.\",\"breadcrumb\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/screamingfrog.club\/en\/web-scraping\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/screamingfrog.club\/en\/web-scraping\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/screamingfrog.club\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Web Scraping\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/screamingfrog.club\/en\/#website\",\"url\":\"https:\/\/screamingfrog.club\/en\/\",\"name\":\"Screaming Frog Club\",\"description\":\"Guide e Tutorial sul Seo SPider\",\"publisher\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/screamingfrog.club\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/screamingfrog.club\/en\/#organization\",\"name\":\"Screaming Frog Club\",\"url\":\"https:\/\/screamingfrog.club\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/screamingfrog.club\/en\/#\/schema\/logo\/image\/\",\"url\":\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/02\/SF_Club_Logo-2.png\",\"contentUrl\":\"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/02\/SF_Club_Logo-2.png\",\"width\":117,\"height\":67,\"caption\":\"Screaming Frog Club\"},\"image\":{\"@id\":\"https:\/\/screamingfrog.club\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/screamingfrog.club\/en\/#\/schema\/person\/cd9ee509ae86128e5e339f9e3de1bc73\",\"name\":\"raffaele.visintin@gmail.com\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410\",\"url\":\"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410\",\"contentUrl\":\"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410\",\"caption\":\"raffaele.visintin@gmail.com\"},\"sameAs\":[\"http:\/\/screamingfrog.club\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Web Scraping and Custom Extraction | Screaming Frog","description":"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/screamingfrog.club\/en\/web-scraping\/","og_locale":"en_US","og_type":"article","og_title":"Web Scraping and Custom Extraction | Screaming Frog","og_description":"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.","og_url":"https:\/\/screamingfrog.club\/en\/web-scraping\/","og_site_name":"Screaming Frog Club","article_published_time":"2024-03-25T07:58:42+00:00","article_modified_time":"2024-07-08T06:47:42+00:00","og_image":[{"url":"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/03\/SF_Back_dark.png","type":"","width":"","height":""}],"author":"raffaele.visintin@gmail.com","twitter_card":"summary_large_image","twitter_misc":{"Written by":"raffaele.visintin@gmail.com","Est. reading time":"27 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/#article","isPartOf":{"@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/"},"author":{"name":"raffaele.visintin@gmail.com","@id":"https:\/\/screamingfrog.club\/en\/#\/schema\/person\/cd9ee509ae86128e5e339f9e3de1bc73"},"headline":"Web Scraping","datePublished":"2024-03-25T07:58:42+00:00","dateModified":"2024-07-08T06:47:42+00:00","mainEntityOfPage":{"@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/"},"wordCount":5358,"commentCount":0,"publisher":{"@id":"https:\/\/screamingfrog.club\/en\/#organization"},"keywords":["Beginner","Guide","video"],"articleSection":["Guide Screaming Frog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/screamingfrog.club\/en\/web-scraping\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/","url":"https:\/\/screamingfrog.club\/en\/web-scraping\/","name":"Web Scraping and Custom Extraction | Screaming Frog","isPartOf":{"@id":"https:\/\/screamingfrog.club\/en\/#website"},"datePublished":"2024-03-25T07:58:42+00:00","dateModified":"2024-07-08T06:47:42+00:00","description":"Learn how to use Screaming Frog to do Web Scraping using Custom Extraction using XPath and RegEx syntax.","breadcrumb":{"@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/screamingfrog.club\/en\/web-scraping\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/screamingfrog.club\/en\/web-scraping\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/screamingfrog.club\/en\/"},{"@type":"ListItem","position":2,"name":"Web Scraping"}]},{"@type":"WebSite","@id":"https:\/\/screamingfrog.club\/en\/#website","url":"https:\/\/screamingfrog.club\/en\/","name":"Screaming Frog Club","description":"Guide e Tutorial sul Seo SPider","publisher":{"@id":"https:\/\/screamingfrog.club\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/screamingfrog.club\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/screamingfrog.club\/en\/#organization","name":"Screaming Frog Club","url":"https:\/\/screamingfrog.club\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/screamingfrog.club\/en\/#\/schema\/logo\/image\/","url":"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/02\/SF_Club_Logo-2.png","contentUrl":"http:\/\/screamingfrog.club\/wp-content\/uploads\/2024\/02\/SF_Club_Logo-2.png","width":117,"height":67,"caption":"Screaming Frog Club"},"image":{"@id":"https:\/\/screamingfrog.club\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/screamingfrog.club\/en\/#\/schema\/person\/cd9ee509ae86128e5e339f9e3de1bc73","name":"raffaele.visintin@gmail.com","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410","url":"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410","contentUrl":"https:\/\/screamingfrog.club\/wp-content\/litespeed\/avatar\/5b2dd661ff57d360794386547fd7a2dd.jpg?ver=1775343410","caption":"raffaele.visintin@gmail.com"},"sameAs":["http:\/\/screamingfrog.club"]}]}},"_links":{"self":[{"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/posts\/3507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/comments?post=3507"}],"version-history":[{"count":0,"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/posts\/3507\/revisions"}],"wp:attachment":[{"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/media?parent=3507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/categories?post=3507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/screamingfrog.club\/en\/wp-json\/wp\/v2\/tags?post=3507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}