Prevent search engines from indexing pages, folders, your entire site, or just your webflow.io subdomain.
You can control which pages search engines crawl on your site by writing a robots.txt
file or by adding a noindex
tag to certain pages. Then, you can prevent search engines from crawling and indexing specific pages, folders, your entire site, or your webflow.io
subdomain. This is useful for hiding pages — like your site’s 404 page — from being indexed and listed in search results.
Important
Content from your site may still be indexed, even if it hasn’t been crawled.
That happens when a search engine finds your content either because it
was published previously, or there’s a link to that content from other
content online. To ensure that a previously indexed page is
not indexed, don’t add it in the robots.txt
. Instead, use
the
Sitemap indexing toggle
to remove that content from Google’s index. You can also use
Google’s removals tool.
How to disable indexing of the Webflow subdomain
You can prevent Google and other search engines from indexing your site’s webflow.io
subdomain by disabling indexing from your Site settings.
- Go to Site settings > SEO > Indexing section
- Set Staging indexing to off
- Click Save and publish your site
This will publish a unique robots.txt
only on the subdomain that tells search engines to ignore this domain.
How to enable or disable indexing of site pages
There are two ways to disable indexing of site pages:
- By using the Sitemap indexing toggle in Page settings
- By generating a
robots.txt
file
Note that if you disable indexing of a site page via a robots.txt
file, the page will still be included in your site’s auto-generated sitemap (if you’ve enabled the sitemap). Additionally, if you’ve previously added a noindex
tag to a site page via custom code, the page will still be included in your site’s auto-generated sitemap (unless you toggle Sitemap indexing to on).
How to disable indexing of site pages with the Sitemap indexing toggle
If you disable indexing of a static site page with the Sitemap indexing toggle, that page will no longer be indexed by search engines and will no longer be included in your site’s sitemap. You can only disable indexing with the toggle if you’ve enabled your site’s auto-generated sitemap.
Note
The Sitemap indexing toggle adds <meta content="noindex" name="robots">
to your site page. This prevents the page from being crawled and indexed by search engines.
To prevent search engines from indexing certain site pages:
- Go to the page you want to prevent Google from indexing
- Go to Page settings > SEO settings
- Toggle Sitemap indexing to off
-
Publish your site
How to re-enable indexing of site pages with the Sitemap indexing toggle
To allow search engines to index certain site pages:
- Go to the page you want to prevent Google from indexing
- Go to Page settings > SEO settings
- Toggle Sitemap indexing to on
-
Publish your site
How to generate a robots.txt
file
The robots.txt is usually used to list the URLs on a site that you want search engines to ignore. You can also include the sitemap of your site in your robots.txt
file to tell search engine crawlers which content they should crawl.
Just like a sitemap, the robots.txt
file lives in the top-level directory of your domain.
To create a robots.txt
file:
- Go to Site settings > SEO > Indexing section
- Add the
robots.txt
rule(s) you want
- Click Save changes and publish your site
Important
Content from your site may still be indexed, even if it hasn’t been crawled.
That happens when a search engine finds your content either because it
was published previously, or there’s a link to that content from other
content online. To ensure that a previously indexed page is
not indexed, don’t add it in the robots.txt
. Instead, use
the
Sitemap indexing toggle
to remove that content from Google’s index. You can also use
Google’s removals tool.
robots.txt
rules
You can use any of these rules to populate the robots.txt
file.
-
User-agent: *
means this section applies to all robots.
-
Disallow:
tells the robot to not visit the site, page, or folder.
Hide your entire site
User-agent: *
Disallow: /
Hide individual pages
User-agent: *
Disallow: /page-name
Hide an entire folder of pages
User-agent: *
Disallow: /folder-name/
Include a sitemap
Sitemap: https://your-site.com/sitemap.xml
Note
Webflow adds a link to your sitemap in your robots.txt
by default.
Check out more useful robots.txt
rules.
Note
Anyone can access your site’s robots.txt
file, so they may be able to identify and access your private content.
Best practices for privacy
If you’d like to prevent the discovery of a particular page or URL on your site, don’t use the robots.txt to disallow the URL from being crawled. Instead, use either of the following options:
FAQ and troubleshooting tips
Can I use a robots.txt
file to prevent my Webflow site assets from being indexed?
It’s not possible to use a robots.txt
file to prevent Webflow site assets from being indexed because a robots.txt
file must live on the same domain as the content it applies to (in this case, where the assets are served). Webflow serves assets from our global CDN, rather than from the custom domain where the robots.txt
file lives. Learn more about asset and file privacy in Webflow.
I removed the robots.txt
file from my Site settings, but it still shows up on my published site. How can I fix this?
Once the robots.txt
has been created, it can’t be completely removed. However, you can replace it with new rules to allow the site to be crawled, e.g.:
User-agent: *
Disallow:
Make sure to save your changes and republish your site. If the issue persists and you still see the old robots.txt
rules on your published site, please contact customer support.
How do I remove previously indexed pages from Google search?
You can use Google’s removals tool to remove previously indexed pages from Google search. Note that unless you also take steps to disable search engine indexing of these pages (e.g., using the Sitemap indexing toggle), they may be indexed by Google again.