Semantic Web: Yahoo takes the first step

ReadWriteWeb has a very good summary of Yahoo’s announcement earlier this week that they’ll soon be indexing content marked up with "semantics", or "meaning".

If all this sounds like geek-speak, bear with me for a moment.   

The change that’s coming to the way we use the web will be quite significant, and impact web publishers (and consumers) greatly.  It’s not a question of if, but when this change arrives. 

My prediction is that most of the evolution will start to show up in the 3 year timeframe.

Today, the way search engines work is essentially:

  • Crawl through the web
  • Read the HTML markup on each page (the "View/Source" behind every webpage)
  • Look hard at this text with lots of "smart" algorithms.  Try to figure out which content is most important, and which is unimportant.  Make some educated guesses. 
  • Look at things like quality-of-sites-pointing-in, meta tags, keyword density, spam lists, etc. to try to assemble a directed graph of what-links-to-what.  Rank it (e.g., Google’s famous PageRank).  Make some educated guesses.
  • Make some educated guesses about what the most "authoritative" search results are for the user, based on the keywords entered
  • Finally, throw back a bunch of URL’s to the user reporting what you’ve found.  Pay the bills by making some educated guesses about what ads that match what the user meant and show those too.

Notice all the "make an educated guess" statements above?  It’s no accident.  Search engines, as good as they seem, are making educated guesses all the time about what we want, and getting it right only part of the time.  They have many tricks to use, but essentially they are reading text and trying to infer a lot more.

Also, note that information itself is not interconnected. 

In other words, you cannot issue a query to Google like this:

     "Show me all the reviews posted by my Facebook friends".   


   "Show me all the products my Facebook friends rated 4 stars or better."


   "Show me photos of all Maui condos that are available for spring break, have 2 bedrooms, and cost between $300 and $800 per night."

In a few years, these kinds of searches will be trivial, and a huge amount of economic value is going to be created and destroyed in the process.

Even though searching is vastly better than what we had just 5 years ago, 2008 is still the stone-ages in search, because in the end, computers are looking at text and making a bunch of guesses.  They are not looking at information in a well-structured, interconnected form. 

Sure, there is a lot of subjective content on the web, and prose is terrific, but in many circumstances that matter — whether you’re trying to research a medical ailment, book a hotel, value a property, or buy a car, you want to get at the data itself.

Search engines have been trying to infer meaning from HTML text. 

But what if web publishers took the time to encode their content jut a little bit further, with the meaning and structure of what they were really writing about?

The simplest and first thing that happens is that the search results get a lot more immediate to the question you just asked.

For instance, if you were shopping for a new digital camera, and typed in "Canon 40D reviews", you really probably want to know a good deal about the reviews themselves, rather than a bunch of URL’s leading you to reviews. 

The way this comes about is now quite apparent. 

Step 1:  Reviews sites publish their reviews in hReview format.  (Until earlier this week, there wasn’t a strong incentive to do so.  And the incentive will get stronger and stronger.) 

Step 2: The search engines look at that structure and present the results in a more orderly way, complete with filtering tools, etc. on the search engine pages themselves.   

If you type in "Maui condo for rent", wouldn’t it be nice if it knew that this was a reservable resource, and it asked you "when do you want to go", and "do you want beach or not"?  You could then immediately see a list of Maui condos for rent (on Yahoo or Google or MSN mind you), along with filtering tools?

As good as we think search engines are today, they are akin to parents helping toddlers on Easter-egg hunts.  "I think you might want to look over here …", "The answer might be here; why don’t you look here?", etc. Except they are not even as good, since search engines are, like the kids themselves, only guessing at what might lurk behind that tree stump.

Up until now, search engines don’t know how to look at a bunch of HTML and extract "meaning", e.g., the "kernel" of the review itself, such as the date it was written, what it was written about, how it scores on a standardized 5-point scale, etc., so they throw up their hands and send you off to various sites to help you assemble the answers yourself.

Over the "long" term (and that’s maybe 0-3 years), I think Yahoo’s groundbreaking announcement will eventually cause:

  • Google and MSN and to respond in kind (as usuallook for Google to push the envelope further here — this is squarely in their strategic wheelhouse of "organizing the world’s information".  Listen up Microsoft — this is finally a real chance to differentiate here, and create a leading innovation in search.  But you need to start leading in microformat creation and toolset delivery.)
  • Web publishers to start publishing content in microformats, which is trivial for most publishers.  (My site, will start microformat markup in 2008, most likely.)
  • New microformats will emerge for nearly any important entity that can be visited, bought, sold, born, killed, or reserved.  Particularly lacking today is a microformat for reservable resources.
  • Which microformats will see quickest adoption?  My guess is that those entities that have a very high frequency and/or need-for-comparison and high fragmentation will likely gain the most value in microformat adoption, like restaurants, lodging, and more
  • Entire business models are at risk.  Pay-for Classified-ad-style companies will lose a lot of power and market cap, an inevitable trend I suggested we’d start seeing back in March ’07.
  • Owners of back-end supply that are structuring fragmented data will become strategically more valuable and visible, if only because their content will show up higher in the search process
  • Structured data will show up in both sponsored ads and the search results themselves.  I could see Google start displaying their top 3 or so microformat-structured results in the "sponsored results" section right at the top.  My suspicion is that structured search results have an opportunity to have a higher clickthrough than the kind of generic "click here for more" CPC ads we see today.
  • We’ll see the browsers themselves going further up-market and presenting structured data for the user.  Note today how the leading browsers highlight if there’s an RSS feed on the page.  Soon, you can expect them to also highlight content-marked-up-with-meaning as well.

It’s particularly interesting to think about who wins and who loses with these significant changes, because that is where wealth is created and destroyed.


  • Companies that help enterprises publish their content in structured form.
  • Search engines, since more of the share-of-attention during search takes place on their site.
  • Companies which get out in front and help forge a company-friendly microformat that play to their strengths
  • … more

In the vacation rental world, my view is that back-end suppliers, like, get much more valuable (because they power the back-end, where the reservation data is), and classified-ad-style directories lose a lot of value, because they have effectively been reselling search engine presence. 

What does it mean in your industry?

Leave a Reply