Is the recent decision by Google to support the RDFa semantic web standard an opportunity or a threat to publishers? And what does this mean for end users and purchasers of information products?
Google have recently added support for a very limited subset of the RDFa vocabulary to their search engine [beginners should start with RDFa Basics]. This will allow them to present much more relevant information directly on the search engine results page (SERP), which in turn will allow users to discover the information they are looking for much more easily. It will also allow Google to build on their “show options” feature, which allows users to narrow down search results by selecting from a relatively limited number of different categories.
Tech-heads will be less than happy with the part-baked nature of Google’s support for the RDFa standard, and rightly so in my view. What it means in practice is that the exact tagging needed to make the new Google features work is currently determined by Google alone, and does not build on semantic web community standards already established, such as FOAF or Dublin Core.
Those publishers who currently enable Google to crawl their sites in order to drive user traffic should take note of this development. Since the main benefit will be to allow the end user more options in navigating search results, and the ability to find relevant results more quickly, those who adopt the new standards should stand to gain user traffic over those who don’t. However one corollary of allowing Google to provide more relevant information directly in the SERP is that, for some searches at least, the relevant information will be embedded directly into the SERP and the user need go no further. Clearly a win for the user (and Google), but a loss for the publisher, whose site will inevitably get less traffic.
Directory, data and search publishers have most to loose from this new state of affairs. Because Google stores all of the information it indexes, the key issue is that those who choose to expose RDFa data also choose to let Google (and the world at large) mine their information for later use. This will amount, for many publishers, to a huge giveaway of factual content. If you code your factual information using RDFa then the whole world can take it away and do with it what it chooses. At the basic level, this will include telephone numbers, contact names, organisational relationships, etc. but will also extend to areas of specialist data such as protein structures, DNA … the implications are potentially huge.
The rub is that if you don’t publish at this level of granularity, users may choose the resources that do over yours, so staying out of the game is not a serious option. On the other side, the threat if you do jump in with both feet is that someone else may take away your data and use it in ways that will be completely out of your control. You’re liable to feel that the tight, tactical game of Premier League football you’ve been playing up to now has suddenly regressed to the state of a village knockabout circa 1300, with anyone out of the crowd able to rush in and steal away with the ball.
Neither are there just two teams anymore. This particular match looks to have had a three way result: Users 1, Google 1, Publishers 0.

Richard Padley
Managing Director,
Semantico
Hi Richard,
Interesting post – and definitely another important challenge for the publishing industry in the battle between paid for/free disclosure of content.
Is there a tech solution that will allow the content providers to play the RDFa game to their advantage? Couldn’t publishers use this opportunity to add brand context to their search results, ensure the users are aware of the quality of the resource, the authors status, the timing of the publication etc?
The range of filters in Google’s ‘Search Options’ is too limited to fulfil the needs of specialist audiences (see this Google post introducing the feature: http://googleblog.blogspot.com/2009/05/more-search-options-and-other-updates.html), but gives us a powerful user-interface suite of pre-learned interface behaviours to include in our own sites: continuity in the search and discovery experience will make our resources easier to user, and therefore more desirable.
I’m not up to speed yet with the development requirement and costs in this area, but it seems that this is an inevitable new ‘doorway’ into content, and publishers have an opportunity to use this to their advantage by creating a supportive, quality, branded experience here.
If you can’t beat ‘em, join ‘em.
Louise
Hi Louise,
Technology solutions which support microformats should already play well with this new development. And, more generally, those publishers and content providers who have been taking care over maintaining their metadata will be in pole position to unlock the new Google search options and snippets features which can be enabled by using RDFa.
Similarly Wolfram Alpha and the forthcoming Google Squares both depend heavily on metadata – and the most effective way of supplying this metadata will be through using RDFa.
I think the brand issue is a really important part of the bigger question of trust and provenance in the semantic web. And although content providers can (and should) use this mechanism to add brand context to their metadata, there is no guarantee that this information will be used by Google, Wolfram or anyone else for that matter.
On the semantic web central control disappears; each application gets to choose exactly which pieces of data are processed and displayed to the end user. Unless brand context is recognised as important it will fall by the wayside.
Richard.
One Trackback
[...] Padley, Richard. “What does Google’s RDFa support mean for publishers?” 18 May 2009. The Discovery Blog. http://blogs.semantico.com/discovery-blog/2009/05/what-does-googles-rdfa-support-mean-for-publishers... [...]