Sunday, November 27, 2011

Siri

The big news in the iPhone 4S seems to be the coming of Siri. While many people were disappointed at the lack of a bigger screen or a brand new exterior to make people drool with envy, some recognized the inclusion of Siri as a game changing innovation.

It is not the voice recognition capability, but the "semantic recognition capability" that impresses the most. For example, here are three simple questions that Siri can answer (from the "Let's talk iPhone" event):

1) “What is the weather like today?” (Siri answered: “Here’s the forecast for today”),
2) “What is the hourly forecast?” (Siri answered: “Here is the weather for today”), and
3) “Do I need an raincoat today?” (Siri answered: “It sure looks like rain today”).

The first two are probably easy enough to achieve just with sophisticated voice recognition, but the third is a lot more tricky. Siri has to know that in asking about clothing, you are "really" asking about weather. But how does she know that?

While the details of Siri's technology are proprietary, Tom Gruber, one of Siri's creators, gives us some brilliant insights in this keynote address. These are the essential points:
  1. Task oriented
  2. Context is king
  3. Precise and limited information space
  4. Semantic auto completion / snap to grid rather than general intelligence
These are what make Siri work in functionally focused mobile devices. That is, devices which are most likely to be used for a set number of fairly routine tasks. The first point simply re iterates that the tasks you are most likely to want to perform on a mobile platform are limited. Siri is not about the long tail of human activities, but the "fat head" as Tom calls it! But it is the second point, context, that makes it easier to guess what the person wants. Where are they? What time is it? What applications are they wanting to use?  Answers to these questions help narrow down the space of possibilities. The third point is about bringing in wider data sources by choosing and modeling external information sources that are likely to be relevant to the possible tasks. That is, the interface between external data and internal task descriptions is precise. These external sources can contribute to the process of guessing the user's intention. Putting these pieces together makes it possible to realize the goal of "auto completing" the user request with the most appropriate action. It is the "semantic snap to grid" which makes Siri appear to understand a request in an intelligent manner. Magic.

The key is precise modeling with semantic technologies in a context aware mobile platform. Mobilesemantics has come of age!

No comments:

Post a Comment