-
- 119 Posts
I've inherited a site where the client has created all internal links in the form of www.xxx.com/99 rather than [[~99]]. This is throwing up duplicate content issues with Google. I've written a quick plugin to 301 redirect these to the right page, but I really need to clean up the database. Can anyone suggest a method to search through the modx_site_content table, identify these and wrap the IDs in the tag? I'm not regex expert but I'd imagine it could be done...
Cheers,
Chris
-
- 228 Posts
I think you have to create a UDF (user defined function) to do this is MYSQL, because of the regex.
It's probably easier to do a dump of the table and perform the regex replace on that, then replace the table contents.
Regex would be something like
find: www.xxx.com/(\d+)
replace: [[~$1]]
[ed. note: christianhanvey last edited this post 10 years, 11 months ago.]
-
- 119 Posts
Cool - thanks for that Christian - will give it a go
-
- 24,544 Posts
Another way to go is a utility snippet. Something like this:
<?php
$docs = $modx->getCollection('modDocument');
$pattern = '/www.xxx.com/(\d+)/';
$replacement = '[[~$1]]';
$count = 0;
foreach ($docs as $doc) {
$content = $doc->getContent();
$hash1 = sha1($content);
$content = preg_replace($pattern, $replacement, $content);
$hash2 = sha1($content);
if ($hash1 === $hash2) { /* no change */.
continue;
}
$doc->setContent($content);
$doc->save();
$count++;
}
return 'Modified ' . $count . ' Resources'
Back up your site_content table first!
I would test it first on the single resource with this in place of the getCollection() call:
$res = $modx->getObject('modResoures', array ('pagetitle' = 'SomeActualPage'));
$docs[] = $res;
Note that this will only do Resources. You may also need to do chunks -- just repeat the code with 'modChunk' instead of 'modDocument'.