Update 09/20/11:
Google has now introduced the rel next and rel prev convention for pagination.
http://www.google.com/support/webmasters/bin/answer.py?answer=1663744
Given this, I would now recommend that you
do use the technique described here, along with the next and prev links as specified by Google to achieve optimal SEO pagination.
Here is a bit of a hack I set up to do the next and prev links. I'm sure there's a better way to do it, but this works.
The idea is to check the page.nav placeholder for the anchor text of the next and previous links, and set placeholders for the appropriate next and prev link elements accordingly. This assumes you're using the SEO-friendly pagination technique outlined below. If not, you'll have to modify this code to point the next and prev links to the URLs of the appropriately paginated pages. In this code, my next and prev nav links have the anchor text » and « respectively. You should modify the code by replacing those with the actual anchor text of your next and previous nav links. Also, replace "html" with whatever you are using for a suffix.
Otherwise, just put this code in a snippet that you call right after the getPage call (which should be at the very beginning of the template, and you should set getPage to output to a placeholder). Make sure you put a property set on this snippet that contains the property pageVarKey, which should be the same as the pageVarKey property you are using for getPage. The way I did this was to just add this code to the paginationSEO snippet described below.
$properties =& $scriptProperties;
$properties['page'] = (isset($_REQUEST[$properties['pageVarKey']]) && ($page = intval($_REQUEST[$properties['pageVarKey']]))) ? $page : 1;
$st_nav = $modx->getPlaceholder('page.nav');
if ($st_nav != '') {
$pos1 = strpos($st_nav,'«');
if($pos1 === false) {
$st_prevLink = '';
}
else {
$prevPageNo = $properties['page']-1;
$st_prev_alias = ($properties['page']>2) ? $modx->resource->get('alias') . '-' . $prevPageNo : $modx->resource->get('alias');
$st_prevLink = '<link rel="prev" href="/' . $st_prev_alias . '.html" />';
}
$pos2 = strpos($st_nav,'»');
if($pos2 === false) {
$st_nextLink = '';
}
else {
$nextPageNo = $properties['page']+1;
$st_next_alias = $modx->resource->get('alias') . '-' . $nextPageNo;
$st_nextLink = '<link rel="next" href="/' . $st_next_alias . '.html" />';
}
}else{
$st_prevLink = '';
$st_nextLink = '';
}
$modx->setPlaceholder('paginated-prev-link', $st_prevLink);
$modx->setPlaceholder('paginated-next-link', $st_nextLink);
Update 06/21/11:
After extensive analysis of the Google Panda update, my recommendation for SEO purposes is
in most cases you should not use the technique described here. Post-Panda, you may not want Google to index your paginated pages as separate URLs, as this may be seen as "shallow" content. If your paginated pages contain lots of unique, valuable content you might still use this technique, but in that case you might simply be better off putting the unique content on another URL altogether.
My current recommendation in most cases would be to
leave GetPage completely alone and let search engines decide how to handle your paginated pages.
Disclaimer: take this advice with a grain of salt and use at your own risk,
nobody really knows how Panda works yet so it's all guesswork at this point. Here's a great article about it on Search Engine Land:
http://searchengineland.com/why-google-panda-is-more-a-ranking-factor-than-algorithm-update-82564
/Update
GetPage rocks! It's a very powerful and easy way to do pagination. However, unfortunately, the default implementation is not completely ideal from an SEO perspective. I've set up more search-optimized pagination with getPage, and I wanted to share my method in case anyone finds it useful or wants to improve upon it.
Disclaimer: I am very experienced with SEO, so I'm confident that the SEO practices in my implementation are sound, but I'm a total n00b at MODx development, so I'm sure there's a better way to do this. This method seems to be working for me, but YMMV. Also, it involves modifying your .htaccess file, and I can't guarantee that my rewrite rule won't conflict with other rules you already have set up, so if you've customized your .htaccess, exercise caution and check for possible conflicts.
There are three main problems with pagination using getPage, from an SEO perspective:
1: non-SEO-friendly URLs
The URLs of getPage pages look like /widgets.html?page=2. For ideal SEO, each page should have a unique plain URL with no query string.
Solution:
First, we need a RewriteRule in .htaccess to rewrite a friendly URL to the page URLs. Add a rule like the following to your .htaccess, _above_ the rule for normal MODx friendly URLs (replace html with whatever extension you use for text/html if different):
RewriteRule ^(.*)-page-([0-9]+)\.html$ $1.html?page=$2 [L,QSA]
This is similar to the friendly URL rule for normal MODx friendly URLs. You can replace the word "page" in the first part with whatever you want, or, if you are certain that you will never use an alias ending in a number, you can just leave out the -page- part altogether. The variable "page" on the right side should be replaced with whatever you are using for the pageVarKey property in getPage, if different.
Now the page at /widgets.html?page=2 can be accessed at /widgets-page-2.html.
Now we have to modify the page navigation templates so that they link to the friendly URLs. To do this, make a new property set for getPage and replace
with
[[*alias]]-page-[[+pageNo]].html
in all of the navigation template properties.
2: Duplicate page 1 content
For good SEO, each page should only be accessed through a single URL. If more than one URL accesses the same content, this creates a duplicate page and can dilute your link juice, as well as leading to the wrong URL version being indexed and ranked.
The setup above will link back to page 1 from the subsequent pages by linking to /widgets-page-1.html. That's a problem, because this will display content identical to /widgets.html. This is also a problem with the default implementation of getPage, and
it would be ideal if getPage were modified so that the subsequent pages link back to page 1 with just the plain alias. However, I don't want to modify getPage, because I want my setup to be compatible with subsequent versions.
Solution:
The only solution I can think of at the moment is to 301 redirect widgets-page-1.html to widgets.html. This is not an ideal solution, because a small amount of link juice is lost when you link through a 301, but it's much better than linking to duplicate pages. Simply add another line to .htaccess
above the other rule like this:
RewriteRule ^(.*)-page-1\.html$ $1.html [R=301,L]
Update 09/20/11: It's occurred to me that you could also accomplish this by writing a snippet to pull the page.nav placeholder and remove the "page-1" from it. Maybe I'll give that a try and post the code if it works.
3: Duplicate page titles, descriptions, and canonical links
For ideal SEO, each page of the site should have a unique title, meta description, and canonical link. However, all pages created with getPage will have the same head section, thus creating duplication. I wrote a simple snippet to solve this problem by using the page number from the $REQUEST to set placeholders for the title, description, and canonical link, using template variables to provide an array of titles and descriptions.
First you should set up template variables called "pagination-titles" and "pagination-descriptions" to hold the titles and descriptions. The values of these should be a list of titles and descriptions you want to use for your pages, separated by double-commas, like this: "page 1 title,,page 2 title,,page 3 title,,page 4 title" etc.
If you have a lot of pages, you probably want to set up some sort of sensible default values based on the pagetitle or some other variable. I use a TV called "primary-keyword" to store the primary keyword for each page, and I base my default titles and descriptions off of that.
Snippet: "pagniationSEO"
It has only two properties: called "pageVarKey" which should be set to the same as your pageVarKey. Default should be set to "page" which is the default for pageVarKey; and "page" which should be set to 0 by default as in getPage.
$properties =& $scriptProperties;
$properties['page'] = (isset($_REQUEST[$properties['pageVarKey']]) && ($page = intval($_REQUEST[$properties['pageVarKey']]))) ? $page : 1;
$st_titles = ',,' . $modx->resource->getTVValue('pagination-titles');//add an empty value for the 0th element
$ar_titles = explode(',,', $st_titles);
$st_paginated_title = (isset($ar_titles[$properties['page']])) ? $ar_titles[$properties['page']] : $ar_titles[1];
$st_descriptions = ',,' . $modx->resource->getTVValue('pagination-descriptions');
$ar_descriptions = explode(',,', $st_descriptions);
$st_paginated_description = (isset($ar_descriptions[$properties['page']])) ? $ar_descriptions[$properties['page']] : $ar_descriptions[1];
$st_paginated_alias = ($properties['page']!=1) ? $modx->resource->get('alias') . '-' . $properties['page'] : $modx->resource->get('alias');
$modx->setPlaceholder('paginated-title', $st_paginated_title);
$modx->setPlaceholder('paginated-description', $st_paginated_description);
$modx->setPlaceholder('paginated-alias', $st_paginated_alias);
This creates placeholders [[+paginated-title]], [[+paginated-description]] and [[+paginated-alias]] that can be used to construct the appropriate head tags. If the number of pages exceeds the provided number of titles/descriptions it just re-uses the values for the first page.
Replace the title, description and canonical link in the head section of your template with this:
<meta name="description" content="[[+paginated-description]]"/>
<title>[[+paginated-title]]</title>
<link rel="canonical" href="[[++site_url]][[+paginated-alias]].html" />
This framework could be extended to provide other unique content for each page, such as headers and so forth, as needed.
[ed. note: esnyder last edited this post 12 years, 7 months ago.]