We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 37437
    • 147 Posts
    Evolution versions of MODX come with a handy, sample robots.txt file that is written to protect just the right directories from search engine crawlers. Does anyone have a pre-configured robots.txt file that is appropriate for the latest release of Revolution? (Just curious -- why isn’t a sample robots.txt file included as a standard Revolution component?)
      • 10457
      • 5 Posts
      would love to see this too if someone can post?
        • 33968
        • 863 Posts
        I think that’s because with Revolution it’s up to you where you place your assets, etc so a standard Robots.txt file wouldn’t really be appropriate.

        But you could try something like this and customise it as you need to:
        # Robots.txt
        User-agent: *
        Disallow: /assets/
        Disallow: /connectors/
        Disallow: /core/
        Disallow: /manager/
        Disallow: /setup/
        
          • 37437
          • 147 Posts
          If you disallow all of those directories, what else is there to crawl? I guess I am somewhat confused as to exactly how the Robots.txt works with a database-driven cms such as MODx. Basically, I want the actual pages of my site to be crawled, but nothing else. Is there a standard way to achieve this?
            • 33968
            • 863 Posts
            If you disallow all of those directories, what else is there to crawl?
            Everything not in those directories, ie. your site content smiley

            Robots.txt does not provide a way to say 'only crawl this folder', you have to disallow access to individual folders. So make sure you block access to the folders you don't want robots to crawl, and they will find their way around to the rest.

            Note also that it only serves as an instruction to robots, some (such as spambots) may completely ignore it.
            • Your site's pages are all generated, there's nothing in the file system for the crawlers to see except for the main index.php file that serves as the trigger for generating the page being requested. Don't forget, your page URLs are really index.php?id=xxx, where xxx is the ID of the resource to use in building the requested page. The friendly URLs like about.html are all done by trickery involving server rewriting and internal MODx juggling IDs and aliases. [ed. note: sottwell last edited this post 12 years, 3 months ago.]
                Studying MODX in the desert - http://sottwell.com
                Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
                Join the Slack Community - http://modx.org
                • 37437
                • 147 Posts
                Thanks for your help; I think I got it. Now, if either of you have any insights on the security issues I also posted last night, that would be great too:

                http://forums.modx.com/thread/72628/2-security-issues-chmod-register-globals#dis-post-403977
                  • 2396
                  • 101 Posts
                  Quote from: okyanet at Aug 17, 2011, 01:39 PM
                  Disallow: /setup/

                  I know this is a very old thread but it is very important to note that this folder should be removed immediately after actually setting up the site.

                  If you need to re-setup the site then copy it back in there - and remove again once you're finished.
                    • 52355
                    • 2 Posts
                    Where i put this "# Robots.txt"
                      • 3749
                      • 24,544 Posts
                      If you disallow the assets directory, Google may not show your images in the search results.

                      @honey -- the robots.txt file goes in the MODX root directory. You may already have one there. If it causes trouble, post the contents of the file here.
                        Did I help you? Buy me a beer
                        Get my Book: MODX:The Official Guide
                        MODX info for everyone: http://bobsguides.com/modx.html
                        My MODX Extras
                        Bob's Guides is now hosted at A2 MODX Hosting