<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The discovery blog &#187; Search</title>
	<atom:link href="http://blogs.semantico.com/discovery-blog/category/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.semantico.com/discovery-blog</link>
	<description>Semantico looks at online publishing</description>
	<lastBuildDate>Thu, 02 Sep 2010 10:22:31 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<image>
			<title>The discovery blog</title>
			<url>http://blogs.semantico.com/discovery-blog/wp-content/uploads/2008/11/logo64.png</url>
			<link>http://blogs.semantico.com/discovery-blog</link>
			<width>64</width>
			<height>64</height>
			<description>Semantico looks at online publishing</description>
		</image>		<item>
		<title>Semantic wave builds momentum</title>
		<link>http://blogs.semantico.com/discovery-blog/2010/08/semantic-wave-builds-momentum/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2010/08/semantic-wave-builds-momentum/#comments</comments>
		<pubDate>Wed, 25 Aug 2010 09:00:14 +0000</pubDate>
		<dc:creator>John Helmer</dc:creator>
				<category><![CDATA[Online Publishing Market]]></category>
		<category><![CDATA[Publishing business models]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=1776</guid>
		<description><![CDATA[The Semantic Web has taken significant steps towards reality in recent months, with the powerful triumvirate of Google, Facebook and Twitter moving to integrate elements of semantic technology into their operations.
All of a sudden, a development that for too long appeared to be stalled by the chicken-and-egg problem of how website owners could be induced [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/08/semantic-wave2.jpg"><img class="alignright size-full wp-image-1790" title="semantic-wave2" src="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/08/semantic-wave2.jpg" alt="" width="300" height="198" /></a>The Semantic Web has taken significant steps towards reality in recent months, with the powerful triumvirate of Google, Facebook and Twitter moving to integrate elements of semantic technology into their operations.</p>
<p>All of a sudden, a development that for too long appeared to be stalled by the chicken-and-egg problem of how website owners could be induced to tag their metadata looks to be in imminent danger of going seriously mainstream.</p>
<p>Marketers, it seems likely, rather than academics, will lead the charge to the VW campers from here on in. And in all probability, publishers and information providers who aren&#8217;t already waxing their boards in preparation for this particular wave of technologic change could risk being left behind as it steadily takes on tsunami proportions and thunders beachwards.</p>
<p><span id="more-1776"></span></p>
<p><strong>Google, Facebook, Twitter embrace semantic technologies</strong></p>
<p>A <a href="http://www.newscientist.com/article/mg20727715.400-google-twitter-and-facebook-build-the-semantic-web.html">recent article</a> in New Scientist (subscription required) described how the giants of search and social media are making moves to actualize the semantic web.</p>
<ol>
<li>Google&#8217;s recent acquisition of Metaweb&#8217;s Freebase, an open-source repository of structured data – or ‘entity graph’ as the company styles it – containing more than 12 million entities, will potentially enable much smarter searching. Entries in Freebase are tagged in such a way that machines can ‘understand’ what they are about and make meaningful connections between them. At the simplest level, computer searches would, for instance, be able to distinguish between David Mitchell the British Novelist and David Mitchell the British Actor, Comedian and Writer (not to mention David Mitchell the Tory politician, David Mitchell the retired American ice dancer, etc. etc.).</li>
<li>Twitter has recently released information about its new ‘annotations’ feature, which allows users to annotate a tweet with structured metadata. A tweet about a new book release, for example, might let you link straight to a ‘look inside’ book widget or the Amazon page for the paperback. Launch of a test version is apparently imminent.</li>
<li>Facebook is making changes to its Open Graph protocol that have a semantic element. Website owners can add a &#8220;like&#8221; button to their site, along with semantic tags that tell Facebook&#8217;s servers what the page is about. According to Facebook: ‘based on the structured data you provide via the Open Graph protocol, your pages show up richly across Facebook: in user profiles, within search results and in News Feed’. So when a Facebook user clicks the ‘like’ button on a publisher’s site – relating to a particular title, or author, perhaps &#8211; a link is established between that site and their Facebook profile.</li>
</ol>
<p><strong>Advertising goes semantic</strong></p>
<p>Any change in the way Google works has major implications for marketers. If using an entity graph changes the way Google delivers its search results significantly, the dark art of Search Engine optimization will have to respond and weighty volumes of SEO best practice to be revised.</p>
<p>But even more wide-ranging changes will have to be made to practice around online marketing, with micro-writing and metadata tagging becoming ever more critical aspects of the marketer’s art, as websites lose their traffic to Google’s interface, which now not only provides a place for people to enter search terms, but also a place for them to read the answers, with no further click-through taking place.</p>
<p>New Scientist speculates, however, that it is in the Facebook and Twitter changes that the main attraction of these developments may lie for advertisers. With the major players in social media on board, apps are already beginning to be written that can exploit the potential of semantically tagged data.</p>
<p>And &#8211; oh dear &#8211; here comes another water-based metaphor: mainstream adoption is likely to open the floodgates for such third-party development. This is because it solves the chicken-and-egg incentive problem of how you get website owners to tag their content. There is a clear incentive for any content owner to tag their content appropriately, providing structured metadata, if it means targeted, relevant access to Facebook’s 500 million plus user base.</p>
<p><strong>Why should you care about this?</strong></p>
<p>The implications for publishers are obvious. The opportunity exists, through semantic technologies, to massively improve the discoverability of their content online. But they also present a threat. Those who move fastest stand to gain a march on their competitors, while those who lag could well miss out.</p>
<p>This throws down yet another gauntlet to a traditionally conservative industry that may well feel it already has quite a bit on its plate to deal with. Even more reason, then, for publishers to embrace the world of online in a concerted fashion, if they are to reap the benefits and stay ahead of the competition.</p>
<p>Surf’s up!</p>
<p>If you’re investigating the use of semantic technologies, talk to Semantico first. We offer a Semantic Web consultancy service focused on helping publishers improve the discoverability of their content using the evolving semantic web. <a href="mailto:info@semantico.com">Contact us today</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2010/08/semantic-wave-builds-momentum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Seven steps to improving findability</title>
		<link>http://blogs.semantico.com/discovery-blog/2010/07/seven-steps-to-improving-findability/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2010/07/seven-steps-to-improving-findability/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 10:57:28 +0000</pubDate>
		<dc:creator>Andrew Grimes</dc:creator>
				<category><![CDATA[Information Architecture]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=1728</guid>
		<description><![CDATA[Making information searchable has never really been the point. Instead, our goal as online publishing specialists is to make our client&#8217;s information findable! After all it isn&#8217;t really the users&#8217; fault if they can&#8217;t find relevant results. Even if they&#8217;re not using quite the right search terms or operators, it is our job to deliver [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/07/johnny_automatic_look_it_up.png"><img class="alignright size-full wp-image-1729" title="Findability" src="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/07/johnny_automatic_look_it_up.png" alt="Cartoon Man trying to find information in a book" width="250" height="194" /></a>Making information searchable has never really been the point. Instead, our goal as online publishing specialists is to make our client&#8217;s information findable! After all it isn&#8217;t really the users&#8217; fault if they can&#8217;t find relevant results. Even if they&#8217;re not using quite the right search terms or operators, it is our job to deliver them the most pertinent information in the right order, maximising the possibility that they will find the information they need.</p>
<p>Search should be clairvoyant: like a magical librarian who somehow correctly guesses what it was you were looking for; offering it up within a fraction of a second, along with a wealth of additional filtering options and navigational possibilities.</p>
<p><span id="more-1728"></span></p>
<p>Without wishing to destroy the magic, here are my seven steps to improving findability:</p>
<h2>1. Define what relevancy means in this context</h2>
<p>Relevancy is a difficult thing to pin down. A set of search results is more or less relevant on the basis of how well the information retrieved meets the need of the user. Arriving at a definition of relevancy therefore means doing some fairly detailed analysis of your users and content. Some BIG questions need answering.<br />
<a title="wikipedia entry precision and recall" href="http://en.wikipedia.org/wiki/Precision_and_recall?">Is recall or precision more important?</a> How do you go about catering to the competing needs of different user groups?</p>
<p>Clearly, discussions need to be had and decisions made. During this process it will be worth considering a range of scenarios where you might like to boost certain results over others:</p>
<ul>
<li> Field weighting<br />
e.g. results within titles are more relevant</li>
<li>Recency of data<br />
e.g. results from recent data are more relevant</li>
<li>Search phrase density<br />
e.g. results which contain the most number of uses of the search phrase are the most relevant</li>
<li>Search phrase term proximity<br />
e.g. results where multiple terms are nearer to each other are more relevant</li>
<li>Records which have been bookmarked, cited or linked to<br />
e.g. results which have already proved themselves to be useful to other users are more relevant</li>
</ul>
<h2>2. Provide lots of options, not just lots of results</h2>
<p>Findability is not just about returning relevant results. A good search implementation will also provide lots of additional further options to the user, which they can use to hone in on exactly what they were searching for. Examples of this include providing:</p>
<ul>
<li>Meaningful facets through which the results can be filtered</li>
<li>&#8216;Did you mean&#8230;?&#8217; option &#8211; for alternative spellings</li>
<li>&#8216;Users who searched for x also searched y&#8217; option &#8211; for related searches</li>
<li>Clustering of search results, e.g. by topic</li>
<li>Sort options</li>
<li>Hit-highlighting &#8211; to highlight the phrase terms in context</li>
</ul>
<h2>3. Enriching the data</h2>
<p>Data not only has to be marked up consistently and correctly, it can often benefit from some enhancement before it goes online. In practice, this can mean additional classification processing or entity extraction through text mining. The goal is to ensure the content itself is rich enough to support the sort of advanced searching and filtering that we want to build within the site.</p>
<h2>4. Measure relevancy</h2>
<p>It’s worth setting up some relevancy metrics to monitor how search is performing over time. A good method is <a href="http://en.wikipedia.org/wiki/Mean_reciprocal_rank">Mean Reciprocal Rank</a>. To implement this you track click-throughs to search results, giving each click-through to a first result a score of 1, each click-through to a second result a score of 1/2, each click-through to a third result a score of 1/3, and so on. Adding all these together will give you the ability to track an overall relevancy score, with a higher score meaning that top links are performing better.</p>
<p>It is also a good idea to monitor searches that return zero results. A monthly list should be reviewed in case there are some sensible search queries in there which will have resulted in user frustration.</p>
<p>Regular reviews of search analytics are a vital part of ensuring that search is still performing well as the site and its content change over time.</p>
<h2>5. Improving the user&#8217;s query</h2>
<p>Normalising the user&#8217;s search phrase (and indeed the search index data) can help to improve findability. The following are all ways in which you do this:</p>
<ul>
<li>Converting all letters to lower or upper case</li>
<li>Removing punctuation, accent marks or diacritics</li>
<li>Expanding abbreviations</li>
<li>Removing stopwords or &#8220;too common&#8221; words</li>
</ul>
<p>Recall might also be improved upon in certain scenarios by converting the users query into a fuzzy query (to return results for close matches to the search terms in order of how well they match). It may also be worth expanding the user&#8217;s search to include synonyms using a thesaurus (to return results where matches have been found for the same or similar concept).</p>
<p>In these ways it is possible to enhance the input query before it has even been sent to the search engine.</p>
<h2>6. Tuning the site for third party findability</h2>
<p>Lots of users (yes, perhaps even MOST) start their search using a third party search engine. It is therefore essential that the site is <a title="article on search engine optimisation" href="http://www.seomoz.org/article/search-ranking-factors">Search Engine Optimised</a>, meaning lots of quality in-bound links, the use of semantic markup, micro formats and much, much more.</p>
<p>It may also be worth creating an Open Search API &#8211; so that third party use of the site&#8217;s search facility is possible.</p>
<h2>7. Finding also means re-finding</h2>
<p>There is a very good chance that users will want to re-use the entries that satisfy their information need. Consequently, improving fundability should also mean making it as easy as possible for users to re-find what they found before. Helping users in this way can be done with features such as:</p>
<ul>
<li> Bookmarks</li>
<li>Saved searches</li>
<li>Direct exporting to citation software</li>
</ul>
<p>So there you have it: seven steps to findability. It is a BIG topic and I&#8217;m certain to have missed out important considerations. Please do feel free to publicly rub my nose in some of them by responding below!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2010/07/seven-steps-to-improving-findability/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Focus on technology not devices, says mobile publishing symposium</title>
		<link>http://blogs.semantico.com/discovery-blog/2010/04/focus-on-technology-not-devices-says-mobile-publishing-symposium/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2010/04/focus-on-technology-not-devices-says-mobile-publishing-symposium/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 13:41:33 +0000</pubDate>
		<dc:creator>John Helmer</dc:creator>
				<category><![CDATA[Access and identity management]]></category>
		<category><![CDATA[E-books]]></category>
		<category><![CDATA[Information Architecture]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Online Publishing Market]]></category>
		<category><![CDATA[Publishing business models]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=1404</guid>
		<description><![CDATA[Report from the Semantico Online Publishing Symposium on Mobile and Cross-platform Delivery
The inaugural Semantico Symposium was held recently in London to discuss implications of the shift to mobile for publishers and information providers. An invited audience of publishing industry leaders debated the issues under Chatham House rules, covering the following three themes:

Devices and technology
Business models
Future [...]]]></description>
			<content:encoded><![CDATA[<h3>Report from the Semantico Online Publishing Symposium on Mobile and Cross-platform Delivery</h3>
<p><a href="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/04/bluebird.jpg"></a><a href="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/04/bluebird.jpg"><img class="size-full wp-image-1416 alignright" title="The Bluebird" src="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2010/04/bluebird.jpg" alt="" width="285" height="191" /></a><strong>The inaugural Semantico Symposium</strong> was held recently in London to discuss implications of the shift to mobile for publishers and information providers. An invited audience of publishing industry leaders debated the issues under Chatham House rules, covering the following three themes:</p>
<ul>
<li>Devices and technology</li>
<li>Business models</li>
<li>Future strategy options</li>
</ul>
<p><span id="more-1404"></span></p>
<p>This was a stimulating event with a high calibre guest list, delegates attending from organisations including Oxford University Press, Nature Publishing Group, Macmillan Education, Wiley-Blackwell, CrossRef, CABI, BSI Group and the Institute of Engineering and Technology. To do justice to the discussion, we’re going to report it over a couple of blog posts, starting with initial theme of devices and technology (yes, it’s a partwork!).</p>
<h2>Forget devices, focus on the underlying technology</h2>
<p>If proof were needed that these are nervous times for publishers, just consider the case of Flash. Not only does Apple not support Flash technology on the iPhone or iPad, but the world’s most popular video-sharing site, YouTube (owned by Google), is quietly in the process of moving away from Flash video. In addition the emerging HTML5 standard, which aims to reduce the need for such proprietary plug-ins, looks likely to make it all but obsolete.  So will Flash die? Almost certainly, say the tech-heads.</p>
<p>This is appalling news for publishers with large amounts of legacy online content in Flash. It also serves as an example of one of the strongest themes to emerge from our Symposium, which is that publishers and information providers who hope to thrive (or at the very least survive) in the turbulent times ahead would be well-advised to disregard, to a certain extent, the hype and wow surrounding high-profile device launches like that of the iPad, and focus on the underlying technology issues in cross-platform delivery. That’s where the real uncertainty lies. Marvellous though they are, it’s not about the devices – but about the content, and the user’s experience of the content.</p>
<p>There is no denying that the iPhone has instituted something of a paradigm shift in the delivery of content, but notwithstanding this undoubted fact, a good deal of skepticism was evinced by our delegates about what is perhaps the most significant innovation to be introduced along with that device, the App Store.</p>
<p>A significant strand of opinion believes that an app is really not that much different from a mobile-optimised website. As far as the user is concerned there is little difference. In the not-too-distant future, it was predicted, you will download something you think is an app but you will actually be interacting with a website optimised for mobile use.</p>
<p>The iPad experience of web surfing (about 42% of our small but select sample had had hands-on experience of the device) might make us question whether we need apps at all, in the opinion of one delegate. Maybe what we need is not apps but better-designed, more mobile-friendly websites.</p>
<h2><strong>So far, so heretical</strong></h2>
<p>However, there is another strand of opinion. From the user’s point of view, the experience of using an app is utterly different from that of using a PC. One virtue of the app is that it does a very narrow, specific thing. Apps streamline our use of the internet and cut out &#8211; or at least reduce &#8211; much of the pain associated with PCs (e.g. constant downloads of plug-ins, patches and updates, the state of total war we have to live in with viruses, spyware and spam, etc.).</p>
<p>A website is always going to feel like a place you go to, to harvest a crop of information. In the case of an app, the crop is turned into biofuel: information becomes the petrol that gets your knowledge car from A to B – to a designated destination. A website might be a field of dreams (if you’ll excuse a criminally over-used film reference), but an app helps you actually do something.</p>
<p>These two points of view are not, in essence, irreconcilable. It’s a matter of perspective; of whether you are looking at things from the producer’s end of things or from the consumer’s. If you strip away the wow, yes, an app is no more than a website. But what produces the ‘wow’ is fantastic usability &#8211; and that’s a matter of primary importance for most end-users.</p>
<h2><strong>Search lags on mobile</strong></h2>
<p>… Which is not to say at all that the current generation of mobile devices together embody a giant leap forward for usability. In actual fact they can look like a bad step backwards.</p>
<p>In particular, search took a while to get established on the desktop internet, and to reach its current state of utility. By comparison, search on mobile is very slow at the moment, even on 3G networks. Also, it is not that easy to find the app you want: the discoverability of apps is not great. This situation is liable to get worse before it gets better, as apps and app stores proliferate.</p>
<p>A certain frustration is surely excusable for those who soldiered through the difficult early years of the millennium when publishers were just beginning to build their first sites, and had to cope with the teething troubles of the early web – only to see many of the same problems coming back to them in 2010. There is a new network, and it has yet to organize itself effectively.</p>
<h2><strong>Monitoring the Big Tech face-offs </strong></h2>
<p>Focusing on underlying technology and networks throws a deal of emphasis on the importance of monitoring and understanding what is going on with some of the major tech companies – and not solely because a few (particularly Amazon and Google) have forged themselves into the publishing value chain, where they are fast becoming almost unavoidable links. We mentioned Flash earlier, owned by Adobe, but there are others to consider as well.</p>
<p>Apple&#8217;s new prominence, which has come about largely as a result of the huge success of the iPhone, is beginning to foreground some of the ways it has of going about things that most annoy people. The dead hand of control that it exercises over what can and cannot be offered through the App Store – amounting to censorship – has led to comparisons with China. Will Google’s Android prove to be a viable Open Source alternative?</p>
<p>Apple has become the company to attack, and the company to position against.</p>
<p>Microsoft appears to be positioning against Apple with Windows 7 by placing emphasis on social networking. This is an important battleground if it really can be established as a point of difference. RIM’s Blackberry Curve phone has crossed over into the teenage market not only because it is a lot cheaper than an iPhone, but because it offers their young audience a more effective way of interacting with their online social networks. It is too easy to write off Microsoft and believe that the important dust-up nowadays is between Apple and Google, but there may well be life in the old dog yet – and Microsoft still has significant market share in mobile operating systems.</p>
<p>Publishers likewise dare not forget, in the age of <strong>the read/write web</strong>, that online publishing is not just about how the stuff gets delivered, but also about how it gets produced, edited, commented, redacted, peer-reviewed … etc., etc. Nowhere is this more true than in the field of academic publishing – because one of the central concerns of academic publishing is scholarly communication.</p>
<p>What this becomes is a debate about how we consume and produce information. Corporate positioning takes on a philosophical, even ideological aspect, the nuances of which publishers have to tune their ears to detect. The first task is to be aware.</p>
<h2><strong>Government unhelpful</strong></h2>
<p>Someone who seems to have a bit of a tin ear in this regard is the great clunking fist himself – if Gordon Brown can be held responsible for the controversial Digital Economy Bill which, at time of writing, is awaiting Royal Assent. There wasn’t much controversy here: instead it was roundly condemned as a piece of rushed and unworkable legislation that will, nevertheless, no doubt remain on the statute books for many years to come.</p>
<h2><strong>The debate continues</strong></h2>
<p>Tune in next time for a further report from the Symposium, as we move to discuss <strong>business models</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2010/04/focus-on-technology-not-devices-says-mobile-publishing-symposium/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>&#8216;Innovation from product to production&#8217; at the STM E-Production Seminar</title>
		<link>http://blogs.semantico.com/discovery-blog/2010/01/innovation-from-product-to-production-at-the-stm-e-production-seminar/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2010/01/innovation-from-product-to-production-at-the-stm-e-production-seminar/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 13:12:25 +0000</pubDate>
		<dc:creator>Richard Padley</dc:creator>
				<category><![CDATA[Access and identity management]]></category>
		<category><![CDATA[E-books]]></category>
		<category><![CDATA[Information Architecture]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Publishing business models]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=990</guid>
		<description><![CDATA[Written and delivered in partnership with Andrea Powell from CABI, this presentation is a case study of lessons drawn from the CAB Direct project, and highlights issues which are relevant across the board for publishers delivering online content. This includes looking at how to maximise value in the design of taxonomies and coding systems, how designing and [...]]]></description>
			<content:encoded><![CDATA[<p>Written and delivered in partnership with Andrea Powell from <a title="CABI" href="http://www.cabi.org/" target="_blank">CABI</a>, this presentation is a case study of lessons drawn from the <a href="http://cabdirect.org/">CAB Direct</a> project, and highlights issues which are relevant across the board for publishers delivering online content. This includes looking at how to maximise value in the design of taxonomies and coding systems, how designing and improving user experience on the product side can lead to more stringent data quality requirements and some design strategies to minimise ongoing operational costs when designing data transfer workflows between systems. We also look at innovation in the design of machine level API interfaces.<br />
<script type="text/javascript">// < ![CDATA[
 function openwindow() {  window.open("http://river-valley.tv/media/conferences/stm-eproduction-2009/0102-Richard_Padley", "mywindow", "menubar=1, resizable=1, width=920, height=509"); }
// ]]&gt;</script><br />
<a href="javascript: openwindow()">You can watch</a> the full presentation (45 <abbr title="minutes">mins</abbr>) given to the STM E-Production Seminar on 3rd December in Kensington London. Please note that the video will be displayed in a new window.</p>
<p><noscript>You also need to have JavaScript enabled in your browser to view the video.</noscript></p>
<p>More on this excellent seminar can be found at <a title="STM E-Production Seminars" href="http://www.stm-assoc.org/event_presentations.php?event_id=18" target="_blank">The International Association of Scientific Technical and Medical Publishers</a> website.</p>
<p>Video by <a title="River Valley TV" href="http://river-valley.tv/" target="_blank">River Valley TV</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2010/01/innovation-from-product-to-production-at-the-stm-e-production-seminar/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Searching for the upturn: notes from Online Information &#8216;09</title>
		<link>http://blogs.semantico.com/discovery-blog/2009/12/searching-for-the-upturn-notes-from-online-information-09/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2009/12/searching-for-the-upturn-notes-from-online-information-09/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 18:25:54 +0000</pubDate>
		<dc:creator>John Helmer</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Online Publishing Market]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=839</guid>
		<description><![CDATA[
It&#8217;s always suspicious (to a jaundiced marketing person&#8217;s eye) when a show organiser chooses to place a large seated cafe area at the centre of the exhibition floor.
There were some noticeable absences at this year&#8217;s Online Information exhibition at Olympia – no doubt the result of crunch-inspired budget caution – and the air of an industry bracing [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-849" title="exhibition floor, Online Information 2009" src="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2009/12/online_information_cropped1.jpg" alt="exhibition floor, Online Information 2009" width="450" height="244" /></p>
<p>It&#8217;s always suspicious (to a jaundiced marketing person&#8217;s eye) when a show organiser chooses to place a large seated cafe area at the centre of the exhibition floor.</p>
<p>There were some noticeable absences at this year&#8217;s Online Information exhibition at Olympia – no doubt the result of crunch-inspired budget caution – and the air of an industry bracing itself for further shocks.<span id="more-839"></span></p>
<p>Public sector cuts have come to be seen as one of the few predictable features of a scarily unpredictable outlook for 2010, but the recent <a href="http://news.bbc.co.uk/1/hi/business/8397832.stm">news from Dubai</a> has shown that the private sector may not have exhausted its store of nasty surprises yet. Meanwhile, closer to home, there&#8217;s one less book retailer on the high street this Christmas with <a href="http://news.bbc.co.uk/1/hi/business/8385117.stm">Borders going down</a> – and as print news sales continue in a similarly downward direction, Rupert Murdoch has sounded the retreat, enjoining other Publishers to follow his lead and dig in behind their paywalls.</p>
<p>Small wonder, perhaps, that the on-stand drinks parties spilling out into the aisles had plenty of space this year to spill out into.</p>
<h2>Quality not quantity</h2>
<p>When a show is in such a period of contraction it&#8217;s usually the organisers who like to bandy the phrase &#8216;quality not quantity&#8217; about, but interestingly it was a Publisher we heard using it this year. A show like this is about conversations, and OK there were a few less conversations happening this year than last, but perhaps they really were more valuable ones. Certainly, there seemed a serious progression from last year in many quarters. Tech industries in boom are notoriously productive of hot air. Sometimes it takes a chillier climate to bring a greater air of reality.</p>
<h2>Answers not results</h2>
<p>The conference at Online Information is always an interesting one. We at Semantico were too busy with those valuable conversations, perhaps, to catch all the sessions worth seeing, but what we did see confirmed our view that, more than ever, Search is a critical issue for our industry.</p>
<p>Two extremes, perhaps, of the current landscape were visible here. The leading edge of semantic search was represented by Conrad Wolfram, who launched <a href="http://www.wolframalpha.com/" target="_self">Wolfram Alpha</a> earlier this year. Like all good things it is now available as an <a href="http://www.itunes.com/apps/wolframalpha">iPhone app</a>. Wolfram&#8217;s assertion is that users don&#8217;t look for search results, but for answers. Being on the leading edge of search he of course denies that what he is doing is search: preferring to say that he is creating &#8216;knowledge environments&#8217;. One thing that really came across as distinctive in his approach is the emphasis on the presentational aspect of data.</p>
<p>At the other extreme, arguably, and at some remove from the leading edge, lies <a href="http://www.bing.com/" target="_self">Bing</a>. It is hard to gainsay the notion that Microsoft is playing catch-up with Google here, as in many of its recent offerings, and despite what it says in the marketing books about &#8216;fast follower&#8217; being the more advantageous positioning in tech markets, surely this can&#8217;t seem a great place to be when you used to be &#8230; well, Microsoft. It&#8217;s hard to think that any sliver of useful innovation Microsoft manages to come up, any incremental enhancements to where we are now, won&#8217;t instantly be snapped up, and probably improved upon, by Google.</p>
<p>Which is not to say that Google doesn&#8217;t have its own catching up to do elsewhere &#8230;</p>
<h2>Get real</h2>
<p>There&#8217;s a lot of talk about real-time search at the moment, some of which was being purveyed at the conference by search guru <a href="http://arnoldit.com/wordpress/" target="_self">Stephen Arnold</a>. This interest largely results from the stellar rise of Twitter, and Google&#8217;s stated ambition to improve its game in this area. But real-time search is hard. At the moment, Google indexes certain sites hourly; however, there are many others that it indexes only three times per year. Keeping up with the global information stream, and prioritising exactly what should be kept up with, is an extremely challenging computational task.</p>
<p>Keeping up with the local information stream can be a challenging task too, even at a slightly downsized event such as this; so these notes are necessarily impressionistic. Over all, our impression is of a tougher, but perhaps slightly more real, marketplace for online publishing. Which bodes well for the upturn when it eventually comes.</p>
<p>We&#8217;d also like to thank all those who thronged our stand on Wednesday afternoon (and spilled out into the aisles) to help celebrate <strong>Semantico&#8217;s tenth birthday</strong>. Here&#8217;s to the next ten!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2009/12/searching-for-the-upturn-notes-from-online-information-09/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search Engine Optimisation for Online Publishing</title>
		<link>http://blogs.semantico.com/discovery-blog/2009/11/search-engine-optimisation-for-online-publishing/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2009/11/search-engine-optimisation-for-online-publishing/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 16:27:04 +0000</pubDate>
		<dc:creator>Liam Sheerin</dc:creator>
				<category><![CDATA[Information Architecture]]></category>
		<category><![CDATA[Online Publishing Market]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=733</guid>
		<description><![CDATA[Search Engine Optimisation is a crucial part of any web strategy. Optimisation techniques involve helping search engines to accurately read and index the information on your site and deliver it to potential users through search results. The best techniques do this while with no impact on the user&#8217;s experience of the site. Here&#8217;s a review [...]]]></description>
			<content:encoded><![CDATA[<p>Search Engine Optimisation is a crucial part of any web strategy. Optimisation techniques involve helping search engines to accurately read and index the information on your site and deliver it to potential users through search results. The best techniques do this while with no impact on the user&#8217;s experience of the site. Here&#8217;s a review of the first steps on any SEO journey.<br />
<span id="more-733"></span></p>
<h2>1. Site Copy</h2>
<p>Copy has a direct impact on search engine optimisation as it provides the crawlers with meaningful semantic information about the site. The book content of any publishing site will contain excellent copy  but marketing and home page material should be carefully considered as this is often where crawlers, users and off-site links find themselves first. The number of keywords per site should be limited to three or four and these should be added to the content in such a way that the content remains natural and readable.</p>
<p>Particular attention should be paid to drafting text for hyperlinks. This is because the words that point to a particular page are heavily weighted by Google when determining keywords for that particular page. Never use phrases like &#8220;Click <em>here</em> for our contact form&#8221; (for example); instead use words that describe the page you&#8217;re linking to by thinking of what a user would search for if they were looking for that page. In our example this might be &#8220;Contact us now using our <em>contact form</em>&#8220;.</p>
<h2>2. Incoming Links</h2>
<p>Google rates a pages relevance using an algorithm called PageRank which is calculated based on the number and relevancy of links pointing to the site. The concept is that each link represents a vote of confidence for your site. But the votes are not equal, links from pages with a high PageRank are worth more. The PageRank of any site is shown on the <a title="Google Toolbar" href="http://toolbar.google.com">Google Toolbar</a>, which you can download from Google. Wikipedia have a good explanation of the <a title="Wikipedia article on the PageRank algorithm" href="http://en.wikipedia.org/wiki/PageRank" target="_blank">PageRank alogorithm</a>.</p>
<h2>3. Google Webmaster tools</h2>
<p><a title="Google Webmaster Tools" href="http://www.google.co.uk/webmasters/" target="_blank">Google Webmaster Tools</a> offers a comprehensive selection of tools for configuring a site for improved search results on Google. The main offering is the ability to submit an XML sitemap that improves the crawl across the site making it easier to control what pages are being indexed. The use of XML sitemaps will improve discoverability of the content for any site. In addition to this, is the ability to see when crawls have taken place and when they are next scheduled. This is of particular importance on each content update to ensure that the most recent updates have been picked up. Google Webmaster Tools also provides:</p>
<ul>
<li>A top search queries feature, where information about user search queries that have returned pages from your site are shown.</li>
<li>An interface for quickly finding all links to a site, allowing incoming links to be tracked.</li>
<li>Keywords and their frequency within the site.</li>
<li>Internal links.</li>
</ul>
<h2>4. Google Analytics</h2>
<p><a title="Google Analytics" href="http://www.google.com/analytics/" target="_blank">Google Analytics</a> offers further integration with Google search as the script that enables the tracking of the site provides constant activity feedback back to Google. Google Analytics should be used to determine where users are coming from and where users are landing. This information in itself will not help search ranking and discoverability, but it will provide metrics with which to measure the progress of any search engine optimisation activity. There are other analytics packages, such as <a title="Yahoo! Web Analytics" href="http://web.analytics.yahoo.com/">Yahoo! Web Analytics</a> (which provides real-time statistics), but Google Analytics has been recommended as it is the leading search provider and integrates well with other Google tools.</p>
<h2>4. URL structure</h2>
<p>Have a look at many complex publishing sites and you might notice that there are no PageRanks on many of the deeper linked pages within the sites. This is most likely due to Google&#8217;s crawlers not reading complex page URLs as these pages are often less meaningfully marked up and are the result of a a query.  For more details you can read <a title="Google article on meaningful URL markup" href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=76329" target="_blank">Google&#8217;s explanation of this behaviour</a>.</p>
<p>Many sites can benefit from an overhaul of the URL structure to provide more meaningful and less complex URL structure. Care needs to be be taken in making sure that the original links are still operable and their appearance in search engine results will diminish over time.</p>
<h2>5. Error pages</h2>
<p>If the user requests an unknown page, return a friendly error message and use the correct <a title="Link to W3's article on HTTP status code" href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html">HTTP status code</a>. This might sound obvious, but it&#8217;s a surprisingly common mistake on publisher platforms. If you don&#8217;t return the correct error code (for example &#8220;404 Not Found&#8221;) then Google will cache dead or bad links on your site, leading to less than optimal search experience for your users. Similarly, link resolution services such as <a title="CrossRef" href="http://www.crossref.org/">CrossRef</a> can only quality check your DOI metadata if the HTTP codes are used correctly.</p>
<h2>Conclusion</h2>
<p>In conclusion, a little attention to the copy and structure of your site can pay dividends in terms of search engine rankings and discoverabilty.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2009/11/search-engine-optimisation-for-online-publishing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What does Google&#8217;s RDFa support mean for publishers?</title>
		<link>http://blogs.semantico.com/discovery-blog/2009/05/what-does-googles-rdfa-support-mean-for-publishers/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2009/05/what-does-googles-rdfa-support-mean-for-publishers/#comments</comments>
		<pubDate>Mon, 18 May 2009 14:07:53 +0000</pubDate>
		<dc:creator>Richard Padley</dc:creator>
				<category><![CDATA[Online Publishing Market]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=293</guid>
		<description><![CDATA[Is Google's support for the RDFa standard an opportunity or a threat to publishers? And what does this mean for users and purchasers of information products?]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.semantico.com:80/discovery-blog/wp-content/uploads/2009/05/rdf.png"><img class="alignleft size-thumbnail wp-image-294" title="rdf logo" src="http://blogs.semantico.com:80/discovery-blog/wp-content/uploads/2009/05/rdf.png" alt="" width="67" height="74" /></a>Is the recent decision by Google to support the <a href="http://www.w3.org/TR/rdfa-syntax/">RDFa</a> semantic web standard an opportunity or a threat to publishers? And what does this mean for end users and purchasers of information products?</p>
<p><span id="more-293"></span></p>
<p>Google have recently added support for a very <a href="http://iandavis.com/blog/2009/05/googles-rdfa-a-damp-squib">limited subset</a> of the RDFa vocabulary to their search engine [beginners should start with <a href="http://www.youtube.com/watch?v=ldl0m-5zLz4" target="_self">RDFa Basics</a>]. This will allow them to present much more relevant information directly on the search engine results page (SERP), which in turn will allow users to discover the information they are looking for much more easily. It will also allow Google to build on their &#8220;show options&#8221; feature, which allows users to narrow down search results by selecting from a relatively limited number of different categories.</p>
<p>Tech-heads will be less than happy with the part-baked nature of Google&#8217;s support for the RDFa standard, and rightly so in my view. What it means in practice is that the exact tagging needed to make the new Google features work is currently determined by Google alone, and does not build on semantic web community standards already established, such as <a href="http://www.foaf-project.org/">FOAF</a> or <a href="http://dublincore.org/">Dublin Core</a>.</p>
<p>Those publishers who currently enable Google to crawl their sites in order to drive user traffic should take note of this development. Since the main benefit will be to allow the end user more options in navigating search results, and the ability to find relevant results more quickly, those who adopt the new standards should stand to gain user traffic over those who don&#8217;t. However one corollary of allowing Google to provide more relevant information directly in the SERP is that, for some searches at least, the relevant information will be embedded directly into the SERP and the user need go no further. Clearly a win for the user (and Google), but a loss for the publisher, whose site will inevitably get less traffic.</p>
<p>Directory, data and search publishers have most to loose from this new state of affairs. Because Google stores all of the information it indexes, the key issue is that those who choose to expose RDFa data also choose to let Google (and the world at large) mine their information for later use. This will amount, for many publishers, to a huge giveaway of factual content. If you code your factual information using RDFa then the whole world can take it away and do with it what it chooses. At the basic level, this will include telephone numbers, contact names, organisational relationships, etc. but will also extend to areas of specialist data such as protein structures, DNA … the implications are potentially huge.</p>
<p>The rub is that if you don&#8217;t publish at this level of granularity, users may choose the resources that do over yours, so staying out of the game is not a serious option. On the other side, the threat if you do jump in with both feet is that someone else may take away your data and use it in ways that will be completely out of your control. You&#8217;re liable to feel that the tight, tactical game of Premier League football you&#8217;ve been playing up to now has suddenly regressed to the state of a village knockabout circa 1300, with anyone out of the crowd able to rush in and steal away with the ball.</p>
<p>Neither are there just two teams anymore. This particular match looks to have had a three way result: Users 1, Google 1, Publishers 0.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2009/05/what-does-googles-rdfa-support-mean-for-publishers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Interview with Stephen E. Arnold</title>
		<link>http://blogs.semantico.com/discovery-blog/2009/01/interview-with-stephen-e-arnold/</link>
		<comments>http://blogs.semantico.com/discovery-blog/2009/01/interview-with-stephen-e-arnold/#comments</comments>
		<pubDate>Mon, 19 Jan 2009 16:56:23 +0000</pubDate>
		<dc:creator>Richard Padley</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://blogs.semantico.com/discovery-blog/?p=165</guid>
		<description><![CDATA[I recently had the pleasure to be interviewed by one of the great minds in search, Stephen Arnold, as part of his &#8216;Search Wizards Speak&#8216; series. Stephen is a straight talking guy, and it was great to have a chance to spend some time with him discussing where search will go over the next couple [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_166" class="wp-caption alignleft" style="width: 310px"><a href="http://www.arnoldit.com/search-wizards-speak/"><img class="size-medium wp-image-166" title="picture-1" src="http://blogs.semantico.com/discovery-blog/wp-content/uploads/2008/12/picture-1.png" alt="Search Wizards Speak" width="300" height="223" /></a><p class="wp-caption-text">Search Wizards Speak</p></div>
<p>I recently had the pleasure to be interviewed by one of the great minds in search, <a href="http://www.arnoldit.com/bio/bio-short.html">Stephen Arnold</a>, as part of his &#8216;<a href="http://www.arnoldit.com/search-wizards-speak/semantico.html">Search Wizards Speak</a>&#8216; series. Stephen is a straight talking guy, and it was great to have a chance to spend some time with him discussing where search will go over the next couple of years.</p>
<p>Go direct to Stephen&#8217;s site for the full text of the <a href="http://www.arnoldit.com/search-wizards-speak/semantico.html">interview</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.semantico.com/discovery-blog/2009/01/interview-with-stephen-e-arnold/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
