01 Sep 2014 | Charles Dawes

When will we start talking to our TVs?

“I’m sorry, Dave. I’m afraid I can’t do that.” HAL 9000 has a disconcerting chat in 2001: A Space Odyssey.

Conversational interfaces are getting much more sophisticated, says Rovi’s Charles Dawes – and a conversation with your TV is much closer than you think.

While science fiction has long portrayed humans travelling to space as the “final frontier,” it’s also depicted voice recognition and interaction as the ultimate human-machine interface.

Yet at the same time, the reality of speech-driven interfaces has been anything but the natural, virtually human speech capabilities we envisioned. While speech-driven interfaces have been used for decades, practical uses have, until recently, been limited to supporting basic structured queries and stock responses.

But with the wider adoption of smartphones and tablets, and the broader advancements of interactive technology, we’ve seen a significant shift in how we interact with our devices. With the introduction of virtual assistants such as Apple’s Siri, speech interfaces go beyond basic menu navigation and data retrieval and have started to catch the interest of consumers.

The availability of speech-to-text engines allows the basic speech enablement of almost any application that previously required tactile inputs like keyboards or touch-screens.

Given the relative ease of “bolting on” a speech engine to any application, it’s not surprising that when it comes to the performance of such voice applications and virtual assistants we see a lot of disparity.

Although there’s evidence of serious attempts to try and break through to the futuristic ideals of speech-driven interfaces, most tools still rely on structured menus for information retrieval or spoken keywords, which simply replace their keyed input counterparts. These are largely unintuitive and certainly don’t support our natural language patterns.

Many existing systems with conversational attributes are inherently task-oriented systems built around a request-and-response framework. This provides a notion of conversation continuity, but in reality each request-response pair is independent of the next and is limited in context – not the ideal basis for a conversation, and when it comes to true conversational interfaces, we’re really only scratching the surface of what’s possible.

What are conversational interfaces?

Conversational interfaces are user interfaces that simulate natural communication qualities on devices and applications, allowing users to interact with them in casual language modes – similar to the way humans converse with one another.

Imagine what this level of interaction can achieve when applied to varied uses, such as trying to book travel, for example – juggling dates, flight schedules, and ticket prices – or deciding what to watch on TV between hundreds of live TV channels, thousands of VoD titles, and potentially millions of OTT options.

Consumers would rather speak their intent naturally and have a device understand and execute on the request, and one of the essential enabling technologies for these new experiences is graph-based search and discovery. Such a graph (aka ‘knowledge graph’) is a semantic database of named entities, where the relationships between entities are dynamically mapped for predictive and intelligent results for search and discovery.

“What’s the film where Tom Hanks works for FedEx?”

The TV viewing experience is a prime example of where a knowledge graph-based semantic approach is of great benefit to consumers. As the landscape becomes increasingly complex with the sheer volume of content available, traditional lexical metadata and structured menu-driven search and navigation prove difficult and cumbersome.

For consumers, a more intuitive discovery and recommendation process is critical as video is a semantically opaque medium. The way people evaluate viewing options includes multiple criterion including cast, plots, genres, moods, and more, all of which are subjective to users.

A knowledge graph assists in this discovery by representing content options in the way people think about programs rather than forcing traditional keyword or structured menu-based attributes on users.

Personal and contextual relevance, like we see in the world of mobile and web services, can also be intelligently mapped for television with similar effect.

For example, most viewers have viewing patterns that can be mapped to provide personalised results. This is more accurate than user-based profile creation or ‘thumbs up/down’ ratings that are both error-prone and do not automatically take into account users’ changing tastes and preferences over time.

The world of hyper-personalisation is extremely important when looking to build consumer loyalty to a particular brand or service, and can be quickly correlated to the knowledge graph’s semantic capabilities.

Semantic technologies become even more interesting with conversational interfaces that enable semantic interpretation for natural language queries, and can discern when a user is drilling down into a context or has switched topics, such as moving from movies to sports.

Not only does this mimic our everyday conversation styles, but is how users typically browse for programming, often not knowing exactly what they want to watch, or meandering through options.

A conversation with your TV – closer than you think

Conversational Interfaces are the logical next phase of interfaces required for the emerging era of smart-connected devices. Technology and market forces are driving towards conversational interfaces at a rapid pace. But simply adding speech enablement to existing solutions will not provide consumers with the interactions they desire.

To become fully functional and effective for users, voice technologies must be backed by sophisticated search capabilities, such as knowledge graphs and deep metadata. By building these technologies effectively, consumers can expect to reap the rewards of fast, accurate and intuitive voice content search.

Amazon’s quirky commercial with actor Gary Busey for their Fire TV highlighted the device’s voice capabilities. While the voice controls with that particular device are not all that notable – offering simple command-based voice input rather than a conversational interface – what is interesting is the device’s remote with built-in microphone.

Companies like Samsung have introduced similar remote control and undoubtedly Google will introduce one to accompany its upcoming Android TV. Expect remotes with built-in microphones to become mainstream in the next couple of years and to be available as part of Pay TV offerings.

Talking to inanimate objects used to be a sign of madness, not so in the future. From TVs and refrigerators to cars and alarm clocks, speech will undoubtedly be the new norm in advanced interaction.

When will we start talking to our TVs?

Women's Euros Scores Big Ratings

Media Jobs

When will we start talking to our TVs?

Chart

Media Jobs