A good few weeks ago i promised i’d report on that talk I attended at work about the semantic web. As they say in french “Chose promise, chose due”, so here we go … It’s a loose interpretation of what was said there, based on my notes at the time.
Ontologies
Ontology is actually a concept from philosophy, not computer science. Quoting wikipedia:
It seeks to describe or posit the basic categories and relationships of being or existence to define entities and types of entities within its framework.
In other words, concepts and categories within a domain of interest and the relationships between them.
To take a practical example from the speech: pizza’s are composed of a base and ingredients (3 concepts already). Ingredients can be tomato sauce, toppings, cheese. Cheese can be parmesan or mozarella. A pizza that contains meat toppings is not a vegetarian pizza. You get the idea.
Ontologies are set up by humans, not by machines: they are an expression of the understanding of the world around them, and machines are not there (yet).
OWL
Ontologies need to be expressed in a parsable way: and the most widely accepted way is OWL, the Web Ontology Language. OWL is a W3C recommendation.
To understand what OWL looks like, you need to know about RDF (Resource Description Framework). RDF allows you to lay relationships between concepts, by using triples (statements made out of 3 components): subject predicate object. Ex. vegetarian pizza – is a kind of – pizza. Sky – has colour – blue.

RDF triples are expressed in XML. The “Resource” part refers to the fact that the subject and the predicate are URI. Quoting Wikipedia:
The subject of an RDF statement is a resource, possibly as named by a Uniform Resource Identifier (URI). Some resources are unnamed and are called blank nodes or anonymous resources. They are not directly identifiable. The predicate is a resource as well, representing a relationship. The object is a resource or a Unicode string literal.
If this sounds familiar to you: RSS is an application of RDF.
An ontology expressed in OWL is a set of these triples around a domain of interest.
OWL comes in 3 flavours:
OWL Lite ⊂ OWL DL ⊂ OWL Full
OWL lite is a light version and contains enough expressiveness to allow to make taxonomies (“is a kind of”, “has 0..1″, …).
OWL DL has maximum expressiveness constrained by what can be used for software.
OWL Full gives full freedom and maximum expressiveness, without taking into account parsability by a computer program.
Software
No need to open vi (unless you really insist): there are editors for setting up an ontology. An example is Protégé, a tool from Stanford, that allows you to set up an ontology in a user-friendly way (Swoop is another).
Now what will computers do with this ? Well, once you’ve got a coherent ontology, you can use an inference engine to draw conclusions, or to search in a semantically relevant way.
Examples of inference engines: FaCT++ (open source),Cerebra (proprietary), Racer, Pellet. There are undoubtedly numerous others i don’t know about.
This part made me think of Prolog, except that OWL is more expressive, and in OWL we work with an Open World assumption: what can’t be proven is true (in Prolog it’s closed world assumption). This can lead to bugs if the ontology is not air-tight.
Practical applications
I’m told there’s successful applications using semantic technologies in genetics and pharmaceutics. I know of some practical applications that allow salespeople to select products to propose to the customers based on that customer’s criteria. I’ve also read a pretty interesting article using RDF in Ruby on Rails for an enlarged social network.
Semantic web ?
So in fact, the dream of a semantic web refers to perfect searchability for documents and resources on the web. You could search on a particular term and get results that make sense, even if they don’t contain the exact word you’re refering to.
Of course we’re still far from an over-arching, universally accepted set of ontologies that would allow us to do that. But the enormous mass of information available makes it necessary for more advanced search mechanisms to be developed. Maybe it will happen subdomain by subdomain. See what the future holds …