Prevent search engines from indexing pages, folders, your entire site, or just your webflow.io subdomain.
You can control which pages search engines crawl on your site in 2 ways: by writing a robots.txt file or by adding a noindex tag to certain pages. Then, you can prevent search engines from crawling and indexing specific pages, folders, your entire site, or your webflow.io subdomain. This is useful for hiding pages — like your site’s 404 page — from being indexed and listed in search results.
ImportantContent from your site may still be indexed, even if it hasn’t been crawled. That happens when a search engine knows about your content either because it was published previously, or there’s a link to that content from other content online. To ensure that a previously indexed page is not indexed, don’t add it in the robots.txt. Instead, use the Sitemap indexing toggle to remove that content from Google’s index.
How to disable indexing of the Webflow subdomain
You can prevent Google and other search engines from indexing your site’s webflow.io subdomain by disabling indexing from your Site settings.
- Go to Site settings > SEO tab > Indexing section
- Set Staging indexing to “Off”
- Click Save and publish your site
This will publish a unique robots.txt only on the subdomain, telling search engines to ignore this domain.
How to enable or disable indexing of site pages
There are 2 ways to disable indexing of site pages:
- By using the Sitemap indexing toggle in Page settings
- By generating a robots.txt file
Note that if you disable indexing of a site page via a robots.txt file, the page will still be included in your site’s auto-generated sitemap (if you’ve enabled the sitemap). Additionally, if you’ve previously added a noindex tag to a site page via custom code, the page will still be included in your site’s auto-generated sitemap (unless you toggle the Sitemap indexing toggle “on”).
How to disable indexing of site pages with the Sitemap indexing toggle
If you disable indexing of a static site page with the Sitemap indexing toggle, that page will no longer be indexed by search engines and will no longer be included in your site’s sitemap. You can only disable indexing with the toggle if you’ve enabled your site’s auto-generated sitemap.
NoteThe Sitemap indexing toggle adds <meta content="noindex" name="robots"> to your site page. This prevents the page from being crawled and indexed by search engines.
To prevent search engines from indexing certain site pages:
- Go to the page you want to prevent Google from indexing
- Go to Page settings > SEO settings
- Toggle Sitemap indexing “off”
-
Publish your site
How to re-enable indexing of site pages with the Sitemap indexing toggle
To allow search engines to index certain site pages:
- Go to the page you want to prevent Google from indexing
- Go to Page settings > SEO settings
- Toggle Sitemap indexing “on”
-
Publish your site
How to generate a robots.txt file
The robots.txt is usually used to list the URLs on a site that you don't want search engines to crawl. You can also include the sitemap of your site in your robots.txt file to tell search engine crawlers which content they should crawl.
Just like a sitemap, the robots.txt file lives in the top-level directory of your domain. Webflow will generate the /robots.txt file for your site once you create it in your Site settings.
To create a robots.txt file:
- Go to Site settings > SEO tab > Indexing section
- Add the robots.txt rule(s) you want
- Click Save changes and publish your site
ImportantImportant: Content from your site may still be indexed, even if it hasn’t been crawled. That happens when a search engine knows about your content either because it was published previously, or there’s a link to that content from other content online. To ensure that a previously indexed page is not indexed, don’t add it in the robots.txt. Instead, use the Sitemap indexing toggle to remove that content from Google’s index.
Robots.txt rules
You can use any of these rules to populate the robots.txt file.
-
User-agent: * means this section applies to all robots.
-
Disallow: tells the robot to not visit the site, page, or folder.
To hide your entire site
User-agent: *
Disallow: /
To hide individual pages
User-agent: *
Disallow: /page-name
To hide an entire folder of pages
User-agent: *
Disallow: /folder-name/
To include a sitemap
Sitemap: https://your-site.com/sitemap.xml
NoteWebflow adds a link to your sitemap to your robots.txt by default.
Helpful resources
Check out more useful robots.txt rules.
NoteAnyone can access your site’s robots.txt file, so they may be able to identify and access your private content.
Best practices for privacy
If you’d like to prevent the discovery of a particular page or URL on your site, don’t use the robots.txt to disallow the URL from being crawled. Instead, use either of the following options:
FAQ and troubleshooting tips
Can I use a robots.txt file to prevent my Webflow site assets from being indexed?
It’s not possible to use a robots.txt file to prevent Webflow site assets from being indexed because a robots.txt file must live on the same domain as the content it applies to (in this case, where the assets are served). Webflow serves assets from our global CDN, rather than from the custom domain where the robots.txt file lives.
I removed the robots.txt file from my Site settings, but it still shows up on my published site. How can I fix this?
Once the robots.txt has been made, it can’t be completely removed. However, you can replace it with new rules to allow the site to be crawled, e.g.:
User-agent: *
Disallow:
Make sure to save your changes and republish your site. If the issue persists and you still see the old robots.txt rules on your published site, please contact customer support.