This 4th of July weekend, I spent probably close to 60+ hours trying to figure out an issue with a new wordpress 2.5.1 blog template. I had run into an issue with duplicate content after moving blog from and old style theme. The issue was the new template did not have a next – previous link to step through the blog.
This new wordpress template pulls into various sections, blog posts from various categories, really a sweet looking template. What sent me into a panic was when I looked at the indexing by google and discovered the issue of duplicate content.
What had happened was xxx.xxxx.com/page/2/ was a duplicate of the index page all except for the title which would indicate that you had in fact switched to /page/2/ or 3 or 4 etc… all of them duplicates of the index page.
Spending hours reading through the numerous issues of duplicate content issues within wordpress did not point me towards any fix. In theory the pages are not there but they are!
They will not goto a 404 page because as far as the database is concerned they are there. For example if the site has 100 posts and you have 5 posts per page you will go up to /page/2/ to /page/20/ on /page/21/ you will get the 404 page.
So to fix the problem, I modified the home.php and added a section with the code from the blog which included the previous and next links.
This at least gave me access now to the pages, but really was not what I was after. Finally after all the coding changes, it suddenly dawned on me, since the template does not really have a blog page, and everything is accessed through the various categories etc… Why not just disallow the /page/ via the robots.txt file.
Testing with Google tools verified that it was a good solution as it would block xxx.xxx.com/page/ yet still allow access to xxx.xxx.com/category/page/2/ etc..
Again to point out that this issue only became visible because the /page/2/ had been previously indexed by the search engines. A new site of course did not have this wordpress duplicate content issue, as there was never any links pointing to them.
With so much controversy about duplicate content issues, from it’s a major problem to who cares. I’d prefer to take a more cautious approach.