Prevent search engines from indexing pages, folders, your entire site, or just your webflow.io subdomain.
You can control which pages search engines crawl on your site by writing a robots.txt
file or by adding a noindex
tag to certain pages. Then, you can prevent search engines from crawling and indexing specific pages, folders, your entire site, or your webflow.io
subdomain. This is useful for hiding pages — like your site’s 404 page — from being indexed and listed in search results.
Important
Content from your site may still be indexed, even if it hasn’t been crawled.
That happens when a search engine finds your content either because it
was published previously, or there’s a link to that content from other
content online. To ensure that a previously indexed page is
not indexed, don’t add it in the robots.txt
. Instead, use
the
Sitemap indexing toggle
to remove that content from Google’s index. You can also use
Google’s removals tool.
How to disable indexing of the Webflow subdomain
You can prevent Google and other search engines from indexing your site’s webflow.io
subdomain by disabling indexing from your Site settings.
- Go to Site settings > SEO > Indexing section
- Set Staging indexing to off
- Click Save and publish your site
This will publish a unique robots.txt
only on the subdomain that tells search engines to ignore this domain.
How to enable or disable indexing of site pages
There are two ways to disable indexing of site pages:
- By using the Sitemap indexing toggle in Page settings
- By generating a
robots.txt
file
Note that if you disable indexing of a site page via a robots.txt
file, the page will still be included in your site’s auto-generated sitemap (if you’ve enabled the sitemap). Additionally, if you’ve previously added a noindex
tag to a site page via custom code, the page will still be included in your site’s auto-generated sitemap (unless you toggle Sitemap indexing to off).
How to disable indexing of site pages with the Sitemap indexing toggle
If you disable indexing of a static site page with the Sitemap indexing toggle, that page will no longer be indexed by search engines and will no longer be included in your site’s sitemap. You can only disable indexing with the toggle if you’ve enabled your site’s auto-generated sitemap.
Note
The Sitemap indexing toggle adds <meta content="noindex" name="robots">
to your site page. This prevents the page from being crawled and indexed by search engines.
To prevent search engines from indexing certain site pages:
- Go to the page you want to prevent search engines from indexing
- Go to Page settings > SEO settings
- Toggle Sitemap indexing to off
-
Publish your site
How to re-enable indexing of site pages with the Sitemap indexing toggle
To allow search engines to index certain site pages:
- Go to the page you want to allow search engines to index
- Go to Page settings > SEO settings
- Toggle Sitemap indexing to on
-
Publish your site
How to generate a robots.txt
file
A robots.txt
file instructs bots (also known as robots, spiders, or web crawlers) on how they should interact with your site. You can add rules to manage bot access to specific pages, folders, or your entire site. It's typically used to list pages or folders on your site that you don't want search engines to crawl or index.
Just like a sitemap, the robots.txt
file lives in the top-level directory of your domain, e.g., yourdomain.com/robots.txt
.
To generate a robots.txt
file:
- Go to Site settings > SEO > Indexing
- Add the
robots.txt
rule(s) you want
- Click Save and publish your site
Important
Content from your site may still be indexed, even if it hasn’t been crawled.
That happens when a search engine finds your content either because it
was published previously, or there’s a link to that content from other
content online. To ensure that a previously indexed page is
not indexed, don’t add it in the robots.txt
. Instead, use
the
Sitemap indexing toggle
to remove that content from Google’s index. You can also use
Google’s removals tool.
robots.txt
rules
Each rule in your robots.txt
file generally consists of two main parts:
-
User-agent: Identifies which bots the rule applies to (e.g.,
*
for all bots, or Googlebot
for Google's web crawler).
-
Disallow / Allow: Defines which of your site’s pages bots can or can't access.
Disallow
blocks access, while Allow
explicitly permits access. Using /
applies the rule to all pages on your site.
Note
Your site’s robots.txt
file is publicly visible — avoid relying
on it to protect sensitive or private content, as anyone can view which
pages or folders you’re trying to restrict.
Example robots.txt
rules
Block all bots from your entire site:
User-agent: *
Disallow: /
Block all bots from a specific page:
User-agent: *
Disallow: /specific-page
Block all bots from a specific folder (e.g., a Collection or set of related pages):
User-agent: *
Disallow: /folder-name/
Allow a specific bot and block all others from your entire site:
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Allow multiple specific bots and block all others from your entire site:
User-agent: Googlebot
Allow: /
User-agent: UptimeRobot
Allow: /
User-agent: *
Disallow: /
Allow all bots but disallow a specific bot from your entire site:
User-agent: BadBot
Disallow: /
User-agent: *
Allow: /
Block all bots from a particular page but allow it to access all other pages:
User-agent: Googlebot
Disallow: /specific-page
Allow: /
Important
Not all bots follow the rules specified in your robots.txt
file, especially malicious or poorly configured ones. As a result, these bots may still access your site, including restricted folders and pages.
Include your sitemap:
Sitemap: https://your-site.com/sitemap.xml
Note
Webflow adds a link to your sitemap in your robots.txt
by default.
Learn more about robots.txt
.
Best practices for privacy
If you’d like to prevent the discovery of a particular page or URL on your site, don’t use the robots.txt to disallow the URL from being crawled. Instead, use either of the following options:
FAQ and troubleshooting tips
Can I use a robots.txt
file to prevent my Webflow site assets from being indexed?
It’s not possible to use a robots.txt
file to prevent Webflow site assets from being indexed because a robots.txt
file must live on the same domain as the content it applies to (in this case, where the assets are served). Webflow serves assets from our global CDN, rather than from the custom domain where the robots.txt
file lives. Learn more about asset and file privacy in Webflow.
I removed the robots.txt
file from my Site settings, but it still shows up on my published site. How can I fix this?
Once the robots.txt
has been created, it can’t be completely removed. However, you can replace it with new rules to allow the site to be crawled, e.g.:
User-agent: *
Disallow:
Make sure to save your changes and republish your site. If the issue persists and you still see the old robots.txt
rules on your published site, please contact customer support.
How do I remove previously indexed pages from Google search?
You can use Google’s removals tool to remove previously indexed pages from Google search. Note that unless you also take steps to disable search engine indexing of these pages (e.g., using the Sitemap indexing toggle), they may be indexed by Google again.