-
- 147 Posts
Evolution versions of MODX come with a handy, sample robots.txt file that is written to protect just the right directories from search engine crawlers. Does anyone have a pre-configured robots.txt file that is appropriate for the latest release of Revolution? (Just curious -- why isn’t a sample robots.txt file included as a standard Revolution component?)
-
- 5 Posts
would love to see this too if someone can post?
-
- 147 Posts
If you disallow all of those directories, what else is there to crawl? I guess I am somewhat confused as to exactly how the Robots.txt works with a database-driven cms such as MODx. Basically, I want the actual pages of my site to be crawled, but nothing else. Is there a standard way to achieve this?
Your site's pages are all generated, there's nothing in the file system for the crawlers to see except for the main index.php file that serves as the trigger for generating the page being requested. Don't forget, your page URLs are really index.php?id=xxx, where xxx is the ID of the resource to use in building the requested page. The friendly URLs like about.html are all done by trickery involving server rewriting and internal MODx juggling IDs and aliases.
[ed. note: sottwell last edited this post 12 years, 3 months ago.]
-
- 2 Posts
Where i put this "# Robots.txt"
-
- 24,544 Posts
If you disallow the assets directory, Google may not show your images in the search results.
@honey -- the robots.txt file goes in the MODX root directory. You may already have one there. If it causes trouble, post the contents of the file here.