November 2, 2023

How To Fix Internal Site Search Spam: A Case Study and Comprehensive Guide

How we caused our client's site to lose hundreds of thousands of impressions and hundreds of clicks, and why they loved us for it!

Skip to:

How to Detect Internal Site Search Spam

How To Fix Internal Site Search Spam

Did We Really Lose Hundreds of Thousands of Impressions?

If you came here via the clickbaity subtitle, or you’re wondering where the catch is, I assure you there isn’t one. This is a real case where technical SEO caused a client to lose tons of impressions and clicks, and they absolutely loved us for it. Of course, the main title lets you know we’re dealing with site search spam, a nefarious tactic on the rise utilizing exploits to highly popular CMS frameworks. So let's dig in first by defining what internal site search spam actually is.

What is Internal Site Search Spam?

Internal site search spam is a tactic by which a malicious actor injects blank pages onto your website by way of using your internal site search. Sounds basic, right? Well, the reasoning typically is. It's a cheap way for these spammers to reach a larger audience by having their site’s terms show up on search even though it’s from other websites. The additional hope is often to improve these sites’ relevance for search engines using the URL and related terms by doing this thousands of times at the expense of the target brand.

For the afflicted, your site can be often flooded with new pages that Google may be crawling and assessing as part of your website. Not only does this often look awful, but enough of it can make your website appear low-quality and spammy. While that is rare, the consequences of that are a loss in search rankings or even a site-level low-quality designation like we see impacting sites from Google’s Helpful Content System

There are a couple of CMSs that are known to have still-existing exploits these spammers can take advantage of, and it's likely as these get patched, more will appear. Both WordPress and Shopify are huge platforms that have ways spammers can use internal site search spam. This example and article will talk primarily about  Shopify’s internal site search spam issue, but the solution will ultimately work for either and potentially future issues too.

How To Detect Internal Site Search Spam

Finding out you’ve been targeted for an internal site search spam attack typically happens as a result of either digging into your Pages report in GSC or, more rarely, investigating a sudden spike in impressions or sometimes even clicks.  Rarest of all, Google Search Console may even provide an alert, though to date I’ve only seen this warning appear once.

A Google Search Console Warning for an internal site search spam attack

We’ll cover three specific ways you can investigate:

  1. Using the Pages Report in GSC
  2. Using the Performance Report in GSC to detect
  3. Using advanced search operators to detect indexed spam pages

Using the Pages Report To Detect an Issue

The way I’ve most often spotted issues with internal site search spam is during a routine check for crawlability and indexability. For those not familiar, auditing crawling and indexation is a key technical SEO task for maintaining site health which looks at how well Google’s bots can discover, crawl, and then be encouraged to index your pages. 

One key part of that evaluation is the Pages Report (formerly the Page Indexing Report) in Google Search Console (GSC) which is a snapshot of how Google crawls and classifies your pages. Here you can find information on broken pages and your broken internal links as well as lots of other technical SEO issues.  A key advantage to this method is with regular monitoring, you’ll be able to potentially pick up the issue BEFORE junk pages are getting indexed by Google.

Google’s pages report is a snapshot of your site’s crawling and indexation from the source.

What you’re looking for when detecting an internal site search spam or any kind of potential crawling or indexing issue are surges in either crawled and not indexed pages or in indexed pages that don’t align with what’s real for your site. The example below is an extreme one where pages were being created at a phenomenal rate and, luckily, weren’t getting indexed.

This example from the pages report shows millions of non-indexed pages created maliciously over a very short period and a clear crawling issue.

The pages report is unfortunately limited in span for how long you can view data (only the last 3 months), so what we can’t see here is the sudden spike in non-indexed pages. However, for a site with only about 7,000 indexed pages, over 2 million not indexed was a clear issue. Normally the spikes won’t be this large, but look for disparities in what should be crawled and what is being crawled or even indexed as a leading indicator, you have a problem.

Curious about Advanced Technical SEO? Check out our advanced guide for more and see how you can control your site’s indexation! 

Using the Performance Report To Detect an Issue

The performance report in GSC is where most SEOs spend their time. It’s the most complete (yet still very incomplete) data from the source of how your site performs on Google SERPs. Here you can view data over 16 months, make comparisons, and filter down to the page and query level to see how your site is performing organically.

The signals to be aware of are the same. Spikes in one of the key metrics, usually impressions, but clicks sometimes too, which indicates you need to take a look. The image below shows how this case looked in the early days of detection. A massive spike in impressions which was at first a cause for joy!

This spike in impressions went from 13k to 100k overnight, which was our first alarm.

Big spikes in impressions or clicks can often be signs of algorithmic changes,  winning keyword strategies and one like this is typically a reason to get excited as an SEO. 

Digging deeper into the pages and queries that were seeing the greatest increases during this period, however, initially brought only confusion.

A look at the impressions and clicks over this period showed only two pages in the top 10 that were actually supposed to be on the site and indexed.

Looking at the length and formation of these URLs led the GR0 team to immediately understand something fishy was going on. A closer look at the individual queries also revealed that the spammy tactics were working and even earning clicks on some of these junk terms. 

How could we tell they were junk, spammy queries? The terms getting the most impressions during the spam attack were phrases like:

  • prada188
  • luxury777
  • airbet88
  • liga365
  • ligaciputra

Along with dozens of other terms, all based around non-us-based gambling and spam sites. If you were to do a search for any of these you’d likely still see sites affected by these tactics that haven’t figured out they’re being used.

Sites are still showing up, likely without their knowledge for these spam terms due to various internal site search spam methods

Note: The reason this tactic is suggested second is simply because you’ll only become aware that there’s a problem once pages are getting indexed. The Pages Report and monitoring crawlability and indexability can give you an early warning.

Using Advanced Search Operators To Detect Internal Site Search Spam

This final method is one you can do in a proactive sense to see if any website you work with or on is potentially having an internal site search spam issue. There are a couple of different CMS platforms that have these issues, and this article focuses on Shopify’s internal site search spam exploit.

Generally, though, this method should work for any so long as you know the types of URL patterns the spammers generate.

Firstly, if you’re not acquainted with advanced search operators (Josh Hardwick does a great job here), I suggest you take a look, as they can be massively helpful to any Google user. 

The specific advanced search operator we’ll be using here is the ‘site:’ search operator, which is maybe the most common and popular of these. By searching: ‘site:https://yourwebsite.com/[malicious-site-search-URL-pattern]’ you can find if Google is actively indexing any pages that might be from a spam attack.

Interestingly enough, it also appears these spammers were trying to advertise the tool they use to create the spammy backlinks that get these pages indexed, which, to me, is a malicious double whammy.

This site: search showed our website had a pair of site search spam pages already getting indexed by Google.

This search operator method is great because it’s super quick and easy, but only works if Google is actually indexing these pages already and doesn’t paint as clear a picture of the scale of the issue as the other two detection methods.

How To Fix Internal Site Search Spam

The fix here is mostly simple and uses one of the strongest ways we can impact indexation, the <meta> noindex tag. There are arguments for other common index control methods that we'll cover a bit later.

One of the key unfortunate parts of this spam attack is you can’t fully prevent these pages from being created. Until these platforms remove these specific vulnerabilities, it can and will continue to happen.  Luckily, these created pages won’t have internal linking you’ll have to worry about fixing in addition to getting them deindexed.

As SEOs (and people sometimes), it is important to focus on what we can control, and in this case, making sure each page created with these exploits receives a noindex and nofollow tag sends the clearest message to Google to keep these pages out of the index. Our client was in Shopify, which allows you to utilize a central theme file to inject code across the site called theme.liquid.

Fixing Shopify Internal Site Search Spam Attacks

**Disclaimer**

Like many SEOs, I am not a developer nor a Shopify dev specifically. This fix was found via the Shopify forum we vetted this internally and presented it to the client who implemented this code snippet to fix their issue and systematically add noindex directives to any pages created in that directory.

In the <head> section of the theme.liquid, you can add a code snippet like this:

{%- if request.path == '/collections/vendors' -%}

<meta name="robots" content="noindex">

{%- endif -%}

What this does is ensure each page the spammers create via this exploit that lives in the /collections/vendors subfolder gets assigned a meta robots noindex tag. This solves the biggest issue, which is these extremely unhelpful pages getting added to Google’s index and being a part of your site’s online presence. 

Can I Use Robots.txt To Block Internal Site Seach Spam?

The next question is, “Should I also add a robots.txt block for this path?” which is actually a tricky question. One of the most commonly misunderstood parts of technical SEO and the robots.txt file is its role in indexation. Some will mistakenly think that no crawling = no indexing, which can be true but isn’t universal. 

The robots.txt file will prevent most crawlers from crawling pages on your site when properly directed and is generally respected. However, this only applies to your site rather than the likely dozens of spammer sites that are backlinking to these pages and getting them indexed in the first place. 

Learn more about Google and Robots.txt in our guide How to Fix Indexed Though Blocked by Robots.txt

Robots.txt only impacts the crawling of your site and is not a directive to prevent indexation, which is the critical issue we want to combat.

Additionally, I should point out that whenever setting a directive like noindex or nofollow, you do have to allow Google to crawl the page in order to read and pick up that shiny new directive you’ve placed there. So I always recommend waiting until a page is actually removed from the index via the noindex directive before you go about adding any robots.txt directives (and of course, don’t mix up your meta robots directives with your robots.txt directives).

While Shopify has in recent years allowed users to edit their robots.txt files (it was previously impossible), this is an area where the juice likely isn’t worth the squeeze. With backlinks as the primary crawling source for Google and the need to pick up your directives, changing the robots.txt won’t truly solve or help fix this issue.

The Results

This is the part that, for once, turned reporting on drastically lower clicks and impressions into a real win for the GR0 team and our client. After implementing this change gradually, our client saw the spammy pages leave the index and stop receiving clicks and impressions.

Normally, a GSC report like this would make an SEO want to cry, but we were proud to report we had solved this issue. 

The client was, of course, thrilled to learn that the GR0 team was able to quickly and effectively put a stop to this unsightly and potentially brand-damaging act by these spammers — which is how I landed on our very clickbaity subtitle.  

Showing technical SEO wins is usually one of the more difficult things you can do. Technical SEO is more often about prevention which this case demonstrates the clear value of, but prevention doesn’t always come with dramatic and neat charts or tables showing how your efforts made a difference like this:

You can see here where a secondary round of URLs got back in after the codebase was updated and inadvertently removed the “noindex” directive for that subdirectory, but luckily it's back.

Conclusion

I hope you enjoyed it and perhaps even learned something you can use from one of our most fun case studies from a now very appreciative client. I hope too this underscores the importance of regular site maintenance and having technical SEO expertise as a part of your website. If you want to add GR0’s general and technical SEO expertise to your digital marketing plan, drop us a message and let us be your next digital marketing agency!

Table of Contents

How we caused our client's site to lose hundreds of thousands of impressions and hundreds of clicks, and why they loved us for it!

Skip to:

How to Detect Internal Site Search Spam

How To Fix Internal Site Search Spam

Did We Really Lose Hundreds of Thousands of Impressions?

If you came here via the clickbaity subtitle, or you’re wondering where the catch is, I assure you there isn’t one. This is a real case where technical SEO caused a client to lose tons of impressions and clicks, and they absolutely loved us for it. Of course, the main title lets you know we’re dealing with site search spam, a nefarious tactic on the rise utilizing exploits to highly popular CMS frameworks. So let's dig in first by defining what internal site search spam actually is.

What is Internal Site Search Spam?

Internal site search spam is a tactic by which a malicious actor injects blank pages onto your website by way of using your internal site search. Sounds basic, right? Well, the reasoning typically is. It's a cheap way for these spammers to reach a larger audience by having their site’s terms show up on search even though it’s from other websites. The additional hope is often to improve these sites’ relevance for search engines using the URL and related terms by doing this thousands of times at the expense of the target brand.

For the afflicted, your site can be often flooded with new pages that Google may be crawling and assessing as part of your website. Not only does this often look awful, but enough of it can make your website appear low-quality and spammy. While that is rare, the consequences of that are a loss in search rankings or even a site-level low-quality designation like we see impacting sites from Google’s Helpful Content System

There are a couple of CMSs that are known to have still-existing exploits these spammers can take advantage of, and it's likely as these get patched, more will appear. Both WordPress and Shopify are huge platforms that have ways spammers can use internal site search spam. This example and article will talk primarily about  Shopify’s internal site search spam issue, but the solution will ultimately work for either and potentially future issues too.

How To Detect Internal Site Search Spam

Finding out you’ve been targeted for an internal site search spam attack typically happens as a result of either digging into your Pages report in GSC or, more rarely, investigating a sudden spike in impressions or sometimes even clicks.  Rarest of all, Google Search Console may even provide an alert, though to date I’ve only seen this warning appear once.

A Google Search Console Warning for an internal site search spam attack

We’ll cover three specific ways you can investigate:

  1. Using the Pages Report in GSC
  2. Using the Performance Report in GSC to detect
  3. Using advanced search operators to detect indexed spam pages

Using the Pages Report To Detect an Issue

The way I’ve most often spotted issues with internal site search spam is during a routine check for crawlability and indexability. For those not familiar, auditing crawling and indexation is a key technical SEO task for maintaining site health which looks at how well Google’s bots can discover, crawl, and then be encouraged to index your pages. 

One key part of that evaluation is the Pages Report (formerly the Page Indexing Report) in Google Search Console (GSC) which is a snapshot of how Google crawls and classifies your pages. Here you can find information on broken pages and your broken internal links as well as lots of other technical SEO issues.  A key advantage to this method is with regular monitoring, you’ll be able to potentially pick up the issue BEFORE junk pages are getting indexed by Google.

Google’s pages report is a snapshot of your site’s crawling and indexation from the source.

What you’re looking for when detecting an internal site search spam or any kind of potential crawling or indexing issue are surges in either crawled and not indexed pages or in indexed pages that don’t align with what’s real for your site. The example below is an extreme one where pages were being created at a phenomenal rate and, luckily, weren’t getting indexed.

This example from the pages report shows millions of non-indexed pages created maliciously over a very short period and a clear crawling issue.

The pages report is unfortunately limited in span for how long you can view data (only the last 3 months), so what we can’t see here is the sudden spike in non-indexed pages. However, for a site with only about 7,000 indexed pages, over 2 million not indexed was a clear issue. Normally the spikes won’t be this large, but look for disparities in what should be crawled and what is being crawled or even indexed as a leading indicator, you have a problem.

Curious about Advanced Technical SEO? Check out our advanced guide for more and see how you can control your site’s indexation! 

Using the Performance Report To Detect an Issue

The performance report in GSC is where most SEOs spend their time. It’s the most complete (yet still very incomplete) data from the source of how your site performs on Google SERPs. Here you can view data over 16 months, make comparisons, and filter down to the page and query level to see how your site is performing organically.

The signals to be aware of are the same. Spikes in one of the key metrics, usually impressions, but clicks sometimes too, which indicates you need to take a look. The image below shows how this case looked in the early days of detection. A massive spike in impressions which was at first a cause for joy!

This spike in impressions went from 13k to 100k overnight, which was our first alarm.

Big spikes in impressions or clicks can often be signs of algorithmic changes,  winning keyword strategies and one like this is typically a reason to get excited as an SEO. 

Digging deeper into the pages and queries that were seeing the greatest increases during this period, however, initially brought only confusion.

A look at the impressions and clicks over this period showed only two pages in the top 10 that were actually supposed to be on the site and indexed.

Looking at the length and formation of these URLs led the GR0 team to immediately understand something fishy was going on. A closer look at the individual queries also revealed that the spammy tactics were working and even earning clicks on some of these junk terms. 

How could we tell they were junk, spammy queries? The terms getting the most impressions during the spam attack were phrases like:

  • prada188
  • luxury777
  • airbet88
  • liga365
  • ligaciputra

Along with dozens of other terms, all based around non-us-based gambling and spam sites. If you were to do a search for any of these you’d likely still see sites affected by these tactics that haven’t figured out they’re being used.

Sites are still showing up, likely without their knowledge for these spam terms due to various internal site search spam methods

Note: The reason this tactic is suggested second is simply because you’ll only become aware that there’s a problem once pages are getting indexed. The Pages Report and monitoring crawlability and indexability can give you an early warning.

Using Advanced Search Operators To Detect Internal Site Search Spam

This final method is one you can do in a proactive sense to see if any website you work with or on is potentially having an internal site search spam issue. There are a couple of different CMS platforms that have these issues, and this article focuses on Shopify’s internal site search spam exploit.

Generally, though, this method should work for any so long as you know the types of URL patterns the spammers generate.

Firstly, if you’re not acquainted with advanced search operators (Josh Hardwick does a great job here), I suggest you take a look, as they can be massively helpful to any Google user. 

The specific advanced search operator we’ll be using here is the ‘site:’ search operator, which is maybe the most common and popular of these. By searching: ‘site:https://yourwebsite.com/[malicious-site-search-URL-pattern]’ you can find if Google is actively indexing any pages that might be from a spam attack.

Interestingly enough, it also appears these spammers were trying to advertise the tool they use to create the spammy backlinks that get these pages indexed, which, to me, is a malicious double whammy.

This site: search showed our website had a pair of site search spam pages already getting indexed by Google.

This search operator method is great because it’s super quick and easy, but only works if Google is actually indexing these pages already and doesn’t paint as clear a picture of the scale of the issue as the other two detection methods.

How To Fix Internal Site Search Spam

The fix here is mostly simple and uses one of the strongest ways we can impact indexation, the <meta> noindex tag. There are arguments for other common index control methods that we'll cover a bit later.

One of the key unfortunate parts of this spam attack is you can’t fully prevent these pages from being created. Until these platforms remove these specific vulnerabilities, it can and will continue to happen.  Luckily, these created pages won’t have internal linking you’ll have to worry about fixing in addition to getting them deindexed.

As SEOs (and people sometimes), it is important to focus on what we can control, and in this case, making sure each page created with these exploits receives a noindex and nofollow tag sends the clearest message to Google to keep these pages out of the index. Our client was in Shopify, which allows you to utilize a central theme file to inject code across the site called theme.liquid.

Fixing Shopify Internal Site Search Spam Attacks

**Disclaimer**

Like many SEOs, I am not a developer nor a Shopify dev specifically. This fix was found via the Shopify forum we vetted this internally and presented it to the client who implemented this code snippet to fix their issue and systematically add noindex directives to any pages created in that directory.

In the <head> section of the theme.liquid, you can add a code snippet like this:

{%- if request.path == '/collections/vendors' -%}

<meta name="robots" content="noindex">

{%- endif -%}

What this does is ensure each page the spammers create via this exploit that lives in the /collections/vendors subfolder gets assigned a meta robots noindex tag. This solves the biggest issue, which is these extremely unhelpful pages getting added to Google’s index and being a part of your site’s online presence. 

Can I Use Robots.txt To Block Internal Site Seach Spam?

The next question is, “Should I also add a robots.txt block for this path?” which is actually a tricky question. One of the most commonly misunderstood parts of technical SEO and the robots.txt file is its role in indexation. Some will mistakenly think that no crawling = no indexing, which can be true but isn’t universal. 

The robots.txt file will prevent most crawlers from crawling pages on your site when properly directed and is generally respected. However, this only applies to your site rather than the likely dozens of spammer sites that are backlinking to these pages and getting them indexed in the first place. 

Learn more about Google and Robots.txt in our guide How to Fix Indexed Though Blocked by Robots.txt

Robots.txt only impacts the crawling of your site and is not a directive to prevent indexation, which is the critical issue we want to combat.

Additionally, I should point out that whenever setting a directive like noindex or nofollow, you do have to allow Google to crawl the page in order to read and pick up that shiny new directive you’ve placed there. So I always recommend waiting until a page is actually removed from the index via the noindex directive before you go about adding any robots.txt directives (and of course, don’t mix up your meta robots directives with your robots.txt directives).

While Shopify has in recent years allowed users to edit their robots.txt files (it was previously impossible), this is an area where the juice likely isn’t worth the squeeze. With backlinks as the primary crawling source for Google and the need to pick up your directives, changing the robots.txt won’t truly solve or help fix this issue.

The Results

This is the part that, for once, turned reporting on drastically lower clicks and impressions into a real win for the GR0 team and our client. After implementing this change gradually, our client saw the spammy pages leave the index and stop receiving clicks and impressions.

Normally, a GSC report like this would make an SEO want to cry, but we were proud to report we had solved this issue. 

The client was, of course, thrilled to learn that the GR0 team was able to quickly and effectively put a stop to this unsightly and potentially brand-damaging act by these spammers — which is how I landed on our very clickbaity subtitle.  

Showing technical SEO wins is usually one of the more difficult things you can do. Technical SEO is more often about prevention which this case demonstrates the clear value of, but prevention doesn’t always come with dramatic and neat charts or tables showing how your efforts made a difference like this:

You can see here where a secondary round of URLs got back in after the codebase was updated and inadvertently removed the “noindex” directive for that subdirectory, but luckily it's back.

Conclusion

I hope you enjoyed it and perhaps even learned something you can use from one of our most fun case studies from a now very appreciative client. I hope too this underscores the importance of regular site maintenance and having technical SEO expertise as a part of your website. If you want to add GR0’s general and technical SEO expertise to your digital marketing plan, drop us a message and let us be your next digital marketing agency!

Let's get started

We’re so excited to bring your story to life. What can we do for you?

Get ready to GR0! Keep an eye on your inbox — we’ll be in touch within one business day.
Oops! Something went wrong while submitting the form.