© Copyright Acquisition International 2025 - All Rights Reserved.

Article Image - Should You Block AI Bots from Crawling Your Website?
Posted 15th April 2024

Should You Block AI Bots from Crawling Your Website?

Did you know AI’s like ChatGPT could be crawling your site for data? AI large language models (LLMs) like ChatGPT and Bard (now called Gemini) has raised a question for businesses: block or allow AI bots like ChatGPT’s GPTBot to crawl your site? As AI is a rapidly developing technology, it’s a question business might not have thought they would need to consider in 2024, but one that should be high on the agenda. With sites like Amazon choosing to block ChatGPT, it’s clear that we should all be considering whether this is the right move.

Mouse Scroll AnimationScroll to keep reading

Let us help promote your business to a wider following.

Should You Block AI Bots from Crawling Your Website?
Web Crawler

Did you know AI’s like ChatGPT could be crawling your site for data? AI large language models (LLMs) like ChatGPT and Bard (now called Gemini) has raised a question for businesses: block or allow AI bots like ChatGPT’s GPTBot to crawl your site? As AI is a rapidly developing technology, it’s a question business might not have thought they would need to consider in 2024, but one that should be high on the agenda. With sites like Amazon choosing to block ChatGPT, it’s clear that we should all be considering whether this is the right move.  

According to SEO and PPC agency MRS Digital, businesses should make a careful decision on whether they want to block AI or not. On the one hand, blocking AI could help prevent risks, such as your content being unintentionally misrepresented. On the other, could you be missing a world of opportunity presented by this seemingly unstoppable technology shift? 

Quick recap: What are crawlers?  

A crawler is essentially a tool that is typically operated by search engines like Google or Bing to review your website and index data from it, like the content you’ve written and information about your company, ensuring your website appears in search results. It’s how search engines like Google discover and understand your site, so the concept of a crawler is nothing new. As a website owner you can decide which parts of your website you want crawlers to be able to view and index in search results by making use of robots.txt files.  
 
AI crawlers use the same technology but instead of simply indexing your website data, AI crawlers review the information on your site and can utilise it to train their own technology (Large Langue Models). 

What AI chatbots could crawl your site?  

While most people will have heard of ChatGPT and Bard (now called Gemini), there are other lesser-known AI crawlers out there.  

So, the other AI crawlers. There’s:  

  • ChatGPT-User. This is used by ChatGPT when a user on GPT-4 directs the bot to your site in a prompt like “tell me how many times [SITE URL] mentions AI”.  
  • GPTBot. This is the crawler that just gets the data from your site for training data for their AI knowledge base. 
  • Google Extended. This is how Google gets data for all their AI products, including Gemini (previously called Bard), their AI chatbot.  
  • Anthropic-AI. Anthropic has a range of AI tools, including Claude, their AI chatbot, and their crawler collects the data for this.  
  • CC-Bot. This is the Common Crawler bot and is what ChatGPT-3 was trained on. It’s designed to make access to data accessible for everyone, without any fees.  

Why would you block AI crawlers? 

Blocking AI crawlers might be the right decision for you, especially if you’re concerned about your content being misrepresented, or your site is in development.  

Misrepresented content 

When humans write content, we write with nuance and there may be cultural or business context that means what you write makes sense to a specific audience. When your content is taken out of that context and used to form part of an AI chatbot’s answer, it will most likely lose the nuance, and the point your content made may have been lost or misrepresented entirely. For some companies, that isn’t something they want to run the risk of, and so they block AI crawlers to prevent this. For example, if you were a medical company who had specific advice pertaining to one of your products, you wouldn’t want an AI to take that out of context to an unrelated product or medical query.  

Unwanted association 

As AI crawlers tend to take sections of information from varying websites without always understanding the context of that piece of information. There is a risk that your information may be presented next to additional sources that your business doesn’t want to be associated with. If this is the case, then you may want to choose to block any AI crawler. This will stop your company from being mixed in with competitors, or those in your industry who may not uphold best practice. For some companies where reputation management is an issue, or has been historically, this could be a very strong argument.  

Data Scraping 

It’s best practice to block any crawlers from viewing parts of your site you don’t want them to see. For example, you might have a staff wellbeing portal on your intranet or customer logins on your website. You don’t want these crawled as they contain personally identifiable information, something your customers or employees definitely don’t want an AI company to have! OpenAI says that the GPTBot is “filtered to remove sources that require paywall access, are known to primarily aggregate personally identifiable information (PII), or have text that violates our policies.” Most websites will already have these blocked, so it’s worth speaking to your hosting provider or SEO team to see if they can add any AI crawlers.  

Spam Generation 

As technology evolves, so do cybercriminals. We’re now seeing the most sophisticated phishing emails and malicious links being sent thanks to AI-generated content. Using a combination of AI-powered chatbots like ChatGPT and data harvested from your site means that spam emails are more realistic than ever. Malicious actors could then use AI to create even more realistic spam emails which could more closely imitate your employees or the company itself. This ultimately could lead to more successful phishing attempts which can cause financial and reputational loss for your business.  

How to block the ChatGPT crawler: 

1. robots.txt 

A robots.txt file will already be present on your site. It’s simply a matter of updating it to exclude the pages you want to block any AI crawler from viewing. Doing this will protect any sensitive data and content that should not be public knowledge. Robots.txt files done wrong can cause your site to no longer be seen by Google and other search engines so it’s best to proceed with caution here. If you have an SEO agency, check in with them before you do this as they may be able to help you.  

You can also allow them to crawl at a certain speed. If you want them to crawl some, but not all, of your site, such as admin areas, this is possible too. Different businesses will have varying reasons to block crawlers, or not block them at all. 

2. Web Application Firewall (WAF) 

You can also use a WAF to block the crawler(s) as well as any unwanted traffic to your site. You’ll be able to keep it up and running for your customers without hindering their experience on your website. 

So, is it really worth blocking AI crawlers?  

When considering whether to block ChatGPT and similar crawlers, there’s more to ponder over than the downsides alone. 

In November 2023, ChatGPT hit 100 million users per week. With this figure likely to grow, that’s a great deal of brand visibility you’re missing out on if you refuse to embrace this technology.  

LLMs are the future of search, did nobody tell you yet? Bing has already embraced AI in the form of Microsoft’s Copilot, and Google is hot on its heels, recently moving the testing of its own AI-powered search – Google Search Generative Experience (SGE) – into the main Google search results. This means that if you’ve ever relied on organic search or SEO for a portion of your business generation, blocking AI could seriously hamper your efforts, if not now, then in the near future. 

There’s even a branch of SEO forming known as Generative Engine Optimisation (GEO) that focuses on improving visibility on popular LLMs like ChatGPT. Again, this may be an emerging acquisition channel that you’re missing out on if you block AI crawlers. 

You should also consider really how effective it is trying to block LLMs from your site. First, you must look beyond the big- name AIs. Blocking ChatGPT alone won’t cut it. Large language models like this are trained on a range of different datasets like Wikipedia and Reddit.  

One of the datasets most commonly used by LLMs (including ChatGPT) is Common Crawl which has been created by a non-profit organisation and crawls the entire internet. So, if you’re genuinely determined to exclude your site from LLMs, then you need block bots like Common Crawl as well more popular crawlers. 

Granting access to your website content can assist in ensuring that your brand is accurately and favourably portrayed to ChatGPT users. Blocking it may actually have the opposite effect if you’re trying to avoid being misrepresented online. 

All said and done, let’s say you bend over backwards to block every known crawler belonging to and contributing to AI LLMs. You’re safe right? Wrong. Your website has almost certainly already been crawled and incorporated into existing datasets like Common Crawl’s. And, at present, there’s no way of removing your website content from these datasets. It all feels rather futile. 

A final word

The rapidly evolving world of AIs is intimidating, and whether you decide to attempt to block LLMs from your site or not, we’d recommend that it’s worth genning up on the subject. Whatever decision you make should be active and informed by up-to-date knowledge. 

A not insignificant 32.9% of the top 1,000 websites on the internet have elected to block the GPTBot. However, for many the growing opportunity presented by AI, combined with the futility of trying to resist the tide, has led to the decision that blocking AI is not the right move. At least for now. 

Categories: News, Strategy


You Might Also Like
Read Full PostRead - Eye Icon
Driving Tomorrow: LeddarTech’s Revolutionary Path in Automotive Software Technology
Innovation
19/12/2023Driving Tomorrow: LeddarTech’s Revolutionary Path in Automotive Software Technology

Founded in 2007, LeddarTech Inc is a global software company based in Québec City, that also has a strong presence in Montreal, Toronto, and Tel Aviv, Israel, thanks to its additional R&D centres in these locations.

Read Full PostRead - Eye Icon
The 2017 Onshore Excellence Awards Press Release
Strategy
26/04/2017The 2017 Onshore Excellence Awards Press Release

The 2017 Onshore Excellence Awards Press Release

Read Full PostRead - Eye Icon
A Guide to the Types of Collateral
Finance
30/06/2021A Guide to the Types of Collateral

If you are applying for finance such as a Lombard loan, you might have to put forward some form of collateral as a guarantee. What works best as collateral? Let’s take a look at some of the options that you could decide to use.

Read Full PostRead - Eye Icon
Schwartz Advisors Advises Olympus Imported Auto Parts in its Acquisition by Genuine Parts Company
M&A
30/03/2016Schwartz Advisors Advises Olympus Imported Auto Parts in its Acquisition by Genuine Parts Company

Schwartz Advisors, an M&A advisory and strategic planning firm for the automotive aftermarket, announced that the firm has acted as exclusive sell-side advisor to Olympus Imported Auto Parts.

Read Full PostRead - Eye Icon
HighTower Acquires $500 Million RIA in Beverly Hills
Finance
07/07/2016HighTower Acquires $500 Million RIA in Beverly Hills

HighTower announces that Acacia Wealth Advisors, a boutique multi-family office in Los Angeles overseeing more than $500 million in assets, has joined the firm.

Read Full PostRead - Eye Icon
Risk Management and Organisational Effectiveness
Leadership
05/01/2022Risk Management and Organisational Effectiveness

Magnifor Consulting is an innovative risk leadership consultancy firm working shoulder-to-shoulder with its clients to maximise the value of uncertainties.

Read Full PostRead - Eye Icon
How to Implement a Flawless Digital Marketing Strategy
Innovation
21/11/2019How to Implement a Flawless Digital Marketing Strategy

A cohesive digital marketing strategy is essential for any company that wants to maximise profits and gain new customers. Many organisations opt to reconfigure traditional marketing strategies in accordance with the digital arena, but this approach to digital

Read Full PostRead - Eye Icon
How to Scale Your Business
News
20/03/2023How to Scale Your Business

Making a success of a business in the long term means having a strategy for growth. Without such a strategy, there’s a risk that you’ll expand in the wrong way, and have to go to the hassle and expense of a restructuring later on.

Read Full PostRead - Eye Icon
Is it Possible to Increase the Recognition of Your Business Using 3D Modeling?
Innovation
14/08/2024Is it Possible to Increase the Recognition of Your Business Using 3D Modeling?

In today’s highly competitive business landscape, standing out and being recognizable are paramount for success. Companies are constantly searching for innovative ways to capture their audience’s attention and distinguish themselves from competitor



Our Trusted Brands

Acquisition International is a flagship brand of AI Global Media. AI Global Media is a B2B enterprise and are committed to creating engaging content allowing businesses to market their services to a larger global audience. We have a number of unique brands, each of which serves a specific industry or region. Each brand covers the latest news in its sector and publishes a digital magazine and newsletter which is read by a global audience.

Arrow