Tuesday, April 6, 2010

On Search Engine Optimization: Why canonical URLs matter

With the explosion of the amount of information organizations make accessible over the Internet, searchability and navigability are increasingly prominent on CMS wish lists. How do you structure site navigation when you have a bit more content than the average company?


Gov.nl

Consider the case of the Dutch national government: To provide convenient access to all governmental information citizens may need on a daily basis, it decided to bring information from 16 different ministries into one CMS.

To make centralization effective it needs to be paired with smart navigation. In larger contexts websites typically choose to provide more than one path to the same content. Technically this is achieved by tagging content instead of dropping it into fixed folders. Hippo CMS helps authors create these tags expediently and as of April 2010 citizens of the Netherlands have a number of centralized starting points to a large chunk of central government information (in Dutch):




Canonical URLs
Flexible navigation may be great for users, it is not ideal for search engines. To offer compact results, search engines try to condense the list of URLs gathered from crawling through different paths to the same content piece. With a variety of algorithms a search engine then decides which URL is the leading road to the content.

Leaving this decision to the search engine is a risky choice. Various URLs for the same content compete with each other in the rankings and the site owner loses control over incoming links. For faceted search paths the number of navigation ways to a single piece of content is practically infinite. If not managed carefully, offering faceted paths could even lead to accidental blacklisting as the search engine might conclude you're trying to game its rating algorithm.



This is where canonical URLs come in. Tagging pages with a canonical URL helps a search engine understand that multiple URLs should be be listed only once. Hippo builds canonicalization into its CMS design, making sure that the dominant URL is provided to the search engine for each unique piece of content. This approach provides a combination of multiple flexible navigation paths, optimized search results and consistency for incoming links.


Example
Let's have a more detailed look at the government example. The following are just two sample URLs leading to an article about tougher laws for repeat DUIs:

http://www.rijksoverheid.nl/onderwerpen/alcohol/nieuws/2010/04/01/rijden-onder-invloed-harder-aangepakt.html


In the source of both pages, invisible to the casual browser, a tag tells the search engine where to place this piece of content in its results.

link rel="canonical" href=http://www.rijksoverheid.nl/nieuws/2010/04/01/rijden-onder-invloed-harder-aangepakt.html"
And this is what Google returns when searching for a few of the article's keywords - Mission accomplished!
  1. Rijden onder invloed harder aangepakt | Nieuwsbericht ...

    1 april 2010 ... Rijden onder invloed harder aangepakt ... Ministerraad. Bel 0800-8051 voor vragen aan de Rijksoverheid ... Zoek binnen rijksoverheid.nl ...
    www.rijksoverheid.nl/.../rijden-onder-invloed-harder-aangepakt.html - Cached



For a more in depth description how you can create canonical URLs with Hippo CMS check out our public wiki