We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 34109
    • 119 Posts
    I've inherited a site where the client has created all internal links in the form of www.xxx.com/99 rather than [[~99]]. This is throwing up duplicate content issues with Google. I've written a quick plugin to 301 redirect these to the right page, but I really need to clean up the database. Can anyone suggest a method to search through the modx_site_content table, identify these and wrap the IDs in the tag? I'm not regex expert but I'd imagine it could be done...

    Cheers,

    Chris
      Studio Republic
      http://www.studiorepublic.com
      0845 226 3205
      @christodhunter
      • 6038
      • 228 Posts
      I think you have to create a UDF (user defined function) to do this is MYSQL, because of the regex.
      It's probably easier to do a dump of the table and perform the regex replace on that, then replace the table contents.

      Regex would be something like
      find: www.xxx.com/(\d+)
      replace: [[~$1]] [ed. note: christianhanvey last edited this post 10 years, 11 months ago.]
        • 34109
        • 119 Posts
        Cool - thanks for that Christian - will give it a go
          Studio Republic
          http://www.studiorepublic.com
          0845 226 3205
          @christodhunter
          • 3749
          • 24,544 Posts
          Another way to go is a utility snippet. Something like this:


          <?php
          $docs = $modx->getCollection('modDocument');
          
          $pattern = '/www.xxx.com/(\d+)/';
          $replacement = '[[~$1]]';
          $count = 0;
          
          foreach ($docs as $doc)  {
             $content = $doc->getContent();
             $hash1 = sha1($content);
             $content = preg_replace($pattern, $replacement, $content);
             $hash2 = sha1($content);
          
             if ($hash1 === $hash2) { /* no change */.
                 continue;
             }
             $doc->setContent($content);
             $doc->save();
             $count++;
          }
          
          return 'Modified ' . $count . ' Resources'


          Back up your site_content table first!

          I would test it first on the single resource with this in place of the getCollection() call:

          $res = $modx->getObject('modResoures', array ('pagetitle' = 'SomeActualPage'));
          $docs[] = $res;


          Note that this will only do Resources. You may also need to do chunks -- just repeat the code with 'modChunk' instead of 'modDocument'.
            Did I help you? Buy me a beer
            Get my Book: MODX:The Official Guide
            MODX info for everyone: http://bobsguides.com/modx.html
            My MODX Extras
            Bob's Guides is now hosted at A2 MODX Hosting