26 Aug Index Pages and Root Directory | Duplicate Content?
If you are reading this, chances are you’ve come across this same dilemma. While checking your sitemap, you realize that Google has just indexed both your root directory www.your-site.com/ and it’s index page www.your-site.com/index.(extension htm, html, php, aspx, etc). Then you realize that both of these pages are essentially the same. Oh no. Duplicate content, one of Google’s big no-nos for SEO.
So now what?
There have been many arguments about this. Some SEOs say to change the .htaccess file and index.(extension) and do a mod rewrite to just rewrite all index.html urls to directory urls. Some redirect, and some do not believe that it is still an issue. Google has spoken about this on a few occasions. What Google addresses this as is called canonicalization. This is not a new term for most SEOs, but this may help to alleviate some confusion on the matter: From MattCutts.com:
Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Sorry that it’s a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:
But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.
Q: So how do I make sure that Google picks the url that I want?
A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to https://example.com/ and the other half go to https://www.example.com/ . Instead, pick the url you prefer and always use that format for your internal links.
Q: Is there anything else I can do?
A: Yes. Suppose you want your default url to be https://www.example.com/ . You can make your webserver so that if someone requests https://example.com/, it does a 301 (permanent) redirect to https://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.).
So, Matt “Google” Cutts is saying that Google will automatically decide what the more important url is, according to what links are pointing to it. So if most of your links are going to the root / , instead of /index.(ext), then the root / directory will be the one indexed.
Personally, I’d recommend doing a 301 redirect from the index page to the root dir. Just to make sure. You can never be too careful!