Tuesday, April 6, 2010

On Search Engine Optimization: Why canonical URLs matter

With the explosion of the amount of information organizations make accessible over the Internet, searchability and navigability are increasingly prominent on CMS wish lists. How do you structure site navigation when you have a bit more content than the average company?


Consider the case of the Dutch national government: To provide convenient access to all governmental information citizens may need on a daily basis, it decided to bring information from 16 different ministries into one CMS.

To make centralization effective it needs to be paired with smart navigation. In larger contexts websites typically choose to provide more than one path to the same content. Technically this is achieved by tagging content instead of dropping it into fixed folders. Hippo CMS helps authors create these tags expediently and as of April 2010 citizens of the Netherlands have a number of centralized starting points to a large chunk of central government information (in Dutch):

Canonical URLs
Flexible navigation may be great for users, it is not ideal for search engines. To offer compact results, search engines try to condense the list of URLs gathered from crawling through different paths to the same content piece. With a variety of algorithms a search engine then decides which URL is the leading road to the content.

Leaving this decision to the search engine is a risky choice. Various URLs for the same content compete with each other in the rankings and the site owner loses control over incoming links. For faceted search paths the number of navigation ways to a single piece of content is practically infinite. If not managed carefully, offering faceted paths could even lead to accidental blacklisting as the search engine might conclude you're trying to game its rating algorithm.

This is where canonical URLs come in. Tagging pages with a canonical URL helps a search engine understand that multiple URLs should be be listed only once. Hippo builds canonicalization into its CMS design, making sure that the dominant URL is provided to the search engine for each unique piece of content. This approach provides a combination of multiple flexible navigation paths, optimized search results and consistency for incoming links.

Let's have a more detailed look at the government example. The following are just two sample URLs leading to an article about tougher laws for repeat DUIs:


In the source of both pages, invisible to the casual browser, a tag tells the search engine where to place this piece of content in its results.

link rel="canonical" href=http://www.rijksoverheid.nl/nieuws/2010/04/01/rijden-onder-invloed-harder-aangepakt.html"
And this is what Google returns when searching for a few of the article's keywords - Mission accomplished!
  1. Rijden onder invloed harder aangepakt | Nieuwsbericht ...

    1 april 2010 ... Rijden onder invloed harder aangepakt ... Ministerraad. Bel 0800-8051 voor vragen aan de Rijksoverheid ... Zoek binnen rijksoverheid.nl ...
    www.rijksoverheid.nl/.../rijden-onder-invloed-harder-aangepakt.html - Cached

For a more in depth description how you can create canonical URLs with Hippo CMS check out our public wiki

Tuesday, January 26, 2010

Why WCM/Portal convergence is a good thing

Stephen Powers' blogpost on convergence between WCM and Portal sparked a nice little controversy about the alleged trade-off between integration and separation of concerns. It ain't necessarily so: If integration is done right, the trade-off does not need to be there at all.

Convergence is a two way street. From the WCM perspective we like to think of Portals as a way to offer 'self service', personalization, security and integration with other applications / widgets / iframes and the like. From the Portal perspective we need WCM to provide tools to work with our portal content that does not reside in other applications.

Both are, of course, nothing new. Neither is the fact that vendors (Hippo with Hippo CMS and Apache Jetspeed Portal is the open source example) have been offering integrated portal offerings for a number of years. Given the challenges and costs involved in true integration, it clearly makes sense to offer integrated solutions as to keep such projects manageable. And who can better ensure integration is done right than the vendor itself?

New in Stephen's post is the fact that he, or rather IBM, sees the package of WCM & Portal becoming such a common combination that distinguishing the two markets would no longer be meaningful. This does not mean that WCM and Portals will become an undistinguishable mesh (or mess?) with the risk of losing all we gained from separating content from the presentation layer in the first place.

It does mean, just like before, that buyers should be careful not to select a package that restricts their choice in where and how to manage and publish their content. It also means that buyers should be critical when making decisions about such things as collaboration platforms. Collaboration systems put in place today, may be with us for many years to come. Where does the content reside? Can it be accessed, altered, integrated in other applications? What about the source code? Remember Lotus Notes?

For maximum flexibility and future readiness in purchasing an integrated WCM & Portal, there are four basic questions to ask:
1. Can the WCM system stand on its own? Would you buy it for its WCM functionality?
2. Can the Portal stand on its own? Would you buy it for its Portal functionality?
3. Are the two really integrated? Integrated user management, security, administration, URL mapping, ease of development etc.
4. Is it Open Source (eg, are you free to use and change the software as you wish)?

Monday, September 21, 2009

Do it yourself software

Recently I refreshed my memory on a company my friend Alef Arendsen, then at SpringSource, pointed out to me. Charles Simonyi of Intentional writes that “all too often the knowledge and insights gained during the development disappear into the details of the code”.

Although Intentional as a company has been around since 2002, and its ideas date back to the 90’s, this is still very much where business software is at today. The need for specialized engineers to get involved to build applications or websites, comes at a loss of direct influence and knowledge about the things that you need to build. Companies that are able to successfully bridge this gap, can produce stunningly usable software that others just don’t seem to get right: Google and Apple are the obvious examples.

In this light Content Management Systems have always been about putting direct control back into the hands of business. When trying to create web pages, back in the early/mid ‘90s, I had to teach myself a bit of HTML for even the simplest piece of green text on a black background. With business software this is no different. While years back for every 1 or 0 you needed a team of programmers, now many of us with simple development skills can create very complex and powerful applications.

Most of these developments stop short of one final step: new technology has empowered the developer, but you still need programming skills to get sophisticated enterprise applications up and running. The business users or ‘lay programmers’ as Martin Fowler calls them here, remain dependent on the IT community to realize their ideas. Hence business users stretch the boundaries of the tools they control: multi billion dollar projects with 1000s of people are entirely managed with cleverly designed spreadsheets.

The first step in CMS development was to take the developers out of the loop in changing content on the website. With this 'Do-it-yourself' software a company did not only dramatically reduce the cost of maintenance, it also enabled business users to control the content directly. This in turn solved many of the translation problems on early websites. With a CMS it was immediately clear what a text would look like online. Text could be correctly fitted and the author could place pictures exactly where he wanted.

Today even the simplest CMS has solved this problem and it is time to push the bar a bit higher: giving business the power to change and build complex websites integrating existing applications without IT support: so it does not require a project proposal and 4 months queuing time to get a new site live. And yes, it would be nice if it could all run on an invisible internal or external cloud infrastructure, so you would not have to worry about plugging in an extra server or getting the backup hooked up.

The benefits are clear: No more ridiculously detailed requirement sheets, no more blame game when the results don’t meet expectations in spite of this effort, and much, much shorter cycles – removing an entire layer of ‘friction’ between customer and producer.

None of this is new – but in practice many challenges remain. While those in the know seem to be able to develop great looking, complex integrated sites in a matter of days or weeks; the average business user still fumbles with his Excel spreadsheet. And while the developer community has largely embraced the open source development model, participation of business users is notably low. With business users largely sidelined, there is a permanent danger for open source projects to lose track of the ‘business intent’ of a project. In projects where the engineers and business end users are one and the same, this danger is relatively low, but with any software that moves up the enterprise ladder, the dangers of a disconnect grow. On the bright side: if successfully tackled there seems to be a terrific opportunity for open source companies to involve more business users in their projects.

Ideally the end result of such a project is not only a CMS driven website or a content driven portal, but a system that allows users to use this as a starting point, with easy to use interfaces for creating additional websites and/or portals.

Here are a few things we do at Hippo make it easier for our partners and customers to build such environments:

  • Provide direct and user friendly access to administrative interfaces; eg, drag and drop Portlet selector for Jetspeed Portal 2
  • Give access to functionality previously only available to programmers through a user friendly UI: eg, Nested Template Picker, Site control functions in CMS 7 or space creation in Jetspeed Portal 2.
  • Involve end-users in ongoing road map meetings, not just during the requirement / delivery phase
  • And of-course: Develop products with a keen eye on keeping things clean and nimble for the daily user

So we're not quite at the full realization of Simonyi’s lofty goals yet, but rapid development environments and the rise of cloud infrastructures provide an opportunity to involve business users in different ways than before. Companies definitely seem eager to embrace it.

Tuesday, June 30, 2009

Back to blogging

Starting a blog.... third attempt

If I could still find my first posts ever, I would have started my blog here with it. But they seem long buried under 100'000s of more recent and more relevant articles or perhaps they really have disappeared from all mirrors and backlogs.

Let me look back for a second at two earlier attempts of creating a meaningful web presence for myself:

In the early nineties I fooled around a bit with the Digitale Stad (=digital city), a web analogy that has largely disappeared (and rightly so). Although popular enough, browsing for information in an artificial subway system with no search functionality was painful for all but the most hard-core fans.

In 1998 in my second attempt at blogging I created a travel journal online and maintained it at Crosswinds. With an online html editor and easy ftp uploading, Crosswinds was easy enough. However, after a few months, my webcounter was still stuck in the teens. My pen pals seemed to lack the passion for the emerging web that I had. For lack of audience I gave up and went back to the one to many weekly email, the persisting standard communication for folks abroad until Facebook.

Back to the blog at hand: unlike the strictly professional or personal blogs out there this one intends to be unfocused from the outset, for the simple reason that I am afraid I wont be able to keep it very fresh if I did not make use of every possible urge I feel to write. I managed without writing a blog for the last 10 years after all. So here it goes...