Quote from: netProphET at Mar 28, 2013, 02:35 AMI'm late to the party here, but wanted to inform you all that we did implement a new feature last week, where new Developer Clouds automatically have a robots.txt (virtual) file that directs robots not to index any of the site. If you want to override this default behavior, all you have to do is put an actual file in place.
Nice to see different ideas for solving the problem of excluding just parts of a site.
I'm even later to the party - but thought it worthwhile to clarify something in case other people stumble across this thread as I did.
So to clarify, the robots.txt that's returned for your Cloud URL - e.g. cxxxx.paas1.tx.modxcloud.com is always the default "deny all". And that's not changed even you enable the "Allow Search Engines to Index this site" option on the cloud. However that setting does change the robots.txt file returned for the "custom" URL assigned to that cloud .. e.g. dev.yourcompany.modxcloud.com.
This caught me out because I had implemented a robots.txt and was testing it using the cloud address, and it was refusing to serve my custom robots.txt file. Hope this helps someone else, and thanks to Mike Schell at Modx Cloud for clarifying this for me!
Cheers
Mark