• Modx index pages duplicates indexed by google (bad for seo)#

  • messyp Reply #1, 1 year, 9 months ago

    Reply
    Hi guys

    In all my modx sites (which have friendly urls turned on) i have the original www.mysite.com indexed in google but also www.mysite.com/mysite gets indexed too... >
    This second page is not wanted and will cause problems in google with duplicate content on the site therefore devaluing my seo efforts. Is this a massive problem with modx?

    I tried to do a 301 redirect but all i get then is:

    http://www.mysite.com/?q=mysite

    Any ideas?


  • MichielM Reply #2, 1 year, 8 months ago

    Reply
    Maybe I can help you out;
    This incorrect indexing of 'duplicate content' stems from being able to get to the same content in
    different ways:
    - via www*yourdomain.com/index.php AND
    - via www*yourdomain.com/index.html (!)
    - by entering h**p://yourdomain.com
    - and by entering yourdomain.com

    Because there's 4 ways to get to the same content, the content is indexed as being duplicate.
    I understand you know this but others might read this too
    Adding rewrite conditions and rules in htaccess prevents this from happening but to get it to work the
    correct syntax is crucial.
    I do not know how you wrote the redirects but what I show in this post works for me.
    I tested the extra code you find below; it is in my particular .htaccess file and in my case it does not
    conflict with the other rules that are 'active'.

    At the end of the post I'll give my full .htaccess file so you can compare it to your own.
    [I haven't tried what happens when other settings are active like for instance the SEO Strict URLs plugin ]

    Sowww...having said all that; on to the work ahead lol First backup your root .htaccess file;
    in case of calamities you have a spare to save the day.

    Now open your root .htacess and find this line:
    # Exclude /assets and /manager directories and images from rewrite rules
    Before that line put this code:

    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
    RewriteRule ^index\.html$ http://www.domain.com/
     [R=301,L]
    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
    RewriteRule ^index\.php$ http://www.domain.com/
     [R=301,L]
    
    

    Leave an empty line after and before this inserted code.
    The code blocks strip index.html and index.php from url's
    (there is probably a way to cover both files in one code block but I am no expert at this so
    I wrote a block for each file extension)
    Also: I found out that things did not work when I placed the codes at the end of the file
    so I experimented with their positions 'till I reached a place where things worked out okay.

    last step:
    at the bottom of your file place this code (remember to leave an empty line above the code block)
    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{HTTP_HOST} ^domain\.com [NC]
    RewriteRule ^(.*)$ http://www.domain.com/$1
     [R=301,L]
    
    


    Final notes:
    - The rules are for .htaccess on apache.
    - Either leave the comment lines as they are (to serve as reminders of what the rules do)
    Or remove them completely but do not uncomment them.

    Hope this helps, as promised my complete htaccess so you can compare settings;
    my code blocks contain the words domain.com so you can see where they are:
    # For full documentation and other suggested options, please see
    # http://svn.modxcms.com/docs/display/MODx096/Friendly+URL+Solutions
    
    # including for unexpected logouts in multi-server/cloud environments
    # and especially for the first three commented out rules
    
    #php_flag register_globals Off
    #AddDefaultCharset utf-8
    #php_value date.timezone Europe/Moscow
    
    Options +FollowSymlinks
    RewriteEngine On
    RewriteBase /
    
    # Fix Apache internal dummy connections from breaking [(site_url)] cache
    RewriteCond %{HTTP_USER_AGENT} ^.*internal\ dummy\ connection.*$ [NC]
    RewriteRule .* - [F,L]
    
    # Rewrite domain.com -> www.domain.com -- used with SEO Strict URLs plugin
    #RewriteCond %{HTTP_HOST} .
    #RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
    #RewriteRule (.*) http://www.example.com/$1
     [R=301,L]
    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
    RewriteRule ^index\.html$ http://www.domain.com/
     [R=301,L]
    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
    RewriteRule ^index\.php$ http://www.domain.com/
     [R=301,L]
    
    # Exclude /assets and /manager directories and images from rewrite rules
    RewriteRule ^(manager|assets)/*$ - [L]
    RewriteRule \.(jpg|jpeg|png|gif|ico)$ - [L]
    
    # For Friendly URLs
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
    
    # Reduce server overhead by enabling output compression if supported.
    #php_flag zlib.output_compression On
    #php_value zlib.output_compression_level 5
    
    #Redirect http://www.domain.com/index.html
     to http://www.domain.com/
    
    RewriteCond %{HTTP_HOST} ^doman\.com [NC]
    RewriteRule ^(.*)$ http://www.domain.com/$1
     [R=301,L]
    


    Thx for reading and Good Luck!


  • outre99 Reply #3, 1 year, 7 months ago

    Reply
    Hi Michiel,

    thanks a lot for a detailed explanation.
    unfortunately even though i copied your htaccess (with the domain changes of course)
    i'm getting into an infinite loop. Firefox says that rewriting will never end.

    Have you seen this kind of issue?
    Thanks


  • freejung Reply #4, 11 months, 2 weeks ago

    Reply
    I would guess that the infinite loop comes from trying to redirect friendly URL requests to index.php?q=xx and then trying to 301 redirect these back to root. It has to be possible to request index.php because that's what is used to generate all of your pages.

    There are plugins you can use to attempt to tackle this problem, but my advice would be to simply use a canonical link. It works well and has the advantage that it also resolves duplicate content issues that you haven't thought of.

    There's a canonical URL snippet for Revo.


  • outre99 Reply #5, 11 months, 2 weeks ago

    Reply
    thanks for the help.
    after doing a site that required multiple languages and using YAMS for the purpose i found that it handles SEO friendly urls very well. so now i use YAMS even for a single language websites.