We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 45278
    • 1 Posts
    Newbie question, Evo 1.0.5:

    I need to prevent one page from being indexed by search engines. It appears that the document variable "Searchable" only refers to internal searching.
    Will unchecking "show in menu" and avoiding any internal links to this page be enough, or is there a more graceful way to do this? I'd actually prefer to place a link to this page on another page that is indexed.
      • 38723
      • 165 Posts
      Quote from: tgraw1234 at Oct 01, 2013, 11:41 AM
      Newbie question, Evo 1.0.5:

      I need to prevent one page from being indexed by search engines. It appears that the document variable "Searchable" only refers to internal searching.
      Will unchecking "show in menu" and avoiding any internal links to this page be enough, or is there a more graceful way to do this? I'd actually prefer to place a link to this page on another page that is indexed.

      I'd suggest that your best way would be to use the Robots meta tag to tell it to not index that specific page. This is Evo, right? In Revo I would set up a TV as a checkbox and when ticked (ie- value is yes or 1) get your template to output in the head of your template this:

      <meta name="robots" content="noindex, follow">


      Or use nofollow if you don't want bots to crawl links from that page either.
        • 27106
        • 147 Posts
        Support for meta name="robots" is spotty. Your most reliable method of exclusion is a robots.txt file in your web root.

        The general web has detailed information on this. See http://www.robotstxt.org/. Google has a useful page too: https://support.google.com/webmasters/answer/156449?rd=1

        You can put the robots.txt document inside MODX, setting the content type as "text". If you have complex robot management needs, this may suit your needs. Use Google's checker to test your setup. [ed. note: shorewalker last edited this post 10 years, 2 months ago.]
          David Walker
          Principal, Shorewalker DMS
          Phone: 03 8899 7790
          Mobile: 0407 133 020
          • 38723
          • 165 Posts
          Good shout. I wasn't even thinking that and I set up a robots.txt for every site I make! I use this basic one in my MODX-Revo-Boilerplate extra:

          User-agent: *
          Disallow: /assets/components/
          Disallow: /manager/
          Disallow: /build/
          Disallow: /connectors/
          Disallow: /core/
          Sitemap: [[++site_url]]sitemap.xml
          


          So I guess you could just add to the disallow ones like David suggests above! (Only changing out Revo tags for Evo ones!)
            • 27106
            • 147 Posts
            Just be aware that robots.txt is a very public document. One effect of disallowing robots from assets, manager, connectors and core folders will be to clearly and publicly mark your site as a MODX site. And robots aren't going to be a problem in those folders. So I wouldn't bother disallowing them.
              David Walker
              Principal, Shorewalker DMS
              Phone: 03 8899 7790
              Mobile: 0407 133 020
              • 19339
              • 6 Posts
              Quote from: pdincubus at Oct 04, 2013, 10:25 AM
              Good shout. I wasn't even thinking that and I set up a robots.txt for every site I make! I use this basic one in my MODX-Revo-Boilerplate extra:

              User-agent: *
              Disallow: /assets/components/
              Disallow: /manager/
              Disallow: /build/
              Disallow: /connectors/
              Disallow: /core/
              Sitemap: [[++site_url]]sitemap.xml
              


              So I guess you could just add to the disallow ones like David suggests above! (Only changing out Revo tags for Evo ones!)

              I believe there is a mistake here. Robots will read the robots.txt file directly and will not understand the modx tag. I actually tested this (on an Evo site) with a checker and the response was:

              The "Sitemap" command requires an absolute URL, that is an URL starting with "http://" (Example: http://www.domain.com/sitemap.xml).
              • Unless the robots.txt file is a resource.
                  Studying MODX in the desert - http://sottwell.com
                  Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
                  Join the Slack Community - http://modx.org
                  • 38723
                  • 165 Posts
                  Quote from: sottwell at Dec 30, 2013, 10:09 PM
                  Unless the robots.txt file is a resource.

                  I've never had any problems using the
                  Sitemap: [[++site_url]]sitemap.xml


                  It's a MODX resource marked as a text document so should be processed by the CMS before output. If you visit yoursite.com/robots.txt when set up like the above you get the correct output as expected. There is no robots.txt static file on the root of the site, so it's not a mistake.
                    • 19339
                    • 6 Posts
                    Happy New Year all, sorry for my late response, we were away the last week.

                    Quote from: pdincubus at Jan 02, 2014, 05:05 PM
                    Quote from: sottwell at Dec 30, 2013, 10:09 PM
                    Unless the robots.txt file is a resource.

                    I've never had any problems using the
                    Sitemap: [[++site_url]]sitemap.xml


                    It's a MODX resource marked as a text document so should be processed by the CMS before output. If you visit yoursite.com/robots.txt when set up like the above you get the correct output as expected. There is no robots.txt static file on the root of the site, so it's not a mistake.

                    Sotwell and pdincubus, I was not aware this could be done, (Robots.txt file as ModX resource and being processed as such).
                    Thx for your explanation.