Profile picture
Phillip Hunter @designoutloud
, 35 tweets, 8 min read Read on Twitter
Time this Monday for a few thoughts on voice-driven "conversational" digital products. (Oh, btw, come hear me talk about them on Wednesday in #Seattle: ) #voice #design
1 - Speech recognition technology often gets labeled as == "conversational". That's incorrect and misleading. Being able to identify the parts of the human anatomy is necessary to be a doctor, but doesn't enable understanding of how bodies work and don't.
2 - Recognition identifies words and phrases. It does not understand the context and define the meaning that the words and phrases represent. It can't. This is why vocabulary and comprehension are both taught in schools.
3 - For example, the word "why" can represent meanings from seeking information to indignation to lamentation and more. All by itself. Speech reco tech only knows that sounds matching the word "why" have been made.
4 - Conversation depends, of course, on accurate reco. But that's just 1 factor humans use to conduct conversation. For meaning, we use multiple levels of context incl. culture, self-owned knowledge, imputed knowledge, audio tonal qualities, visual signals, environment, & more.
5 - Speech reco tech uses zero of any of that. Little of it is captured and even tonal qualities are discarded during the processing. (Sure, there are efforts to capture and use some factors in addition to the voice signal, but that is separate tech. And super iffy.)
6 - To get to conversation, meaning is required. Meaning requires knowledge, perception, interpretation, and most of all context. Achieving meaning—especially with agreement that it is shared—is a fascinatingly challenging and mysterious task.
7 - All of us, even the smartest and most expressive of us, work hard to reliably achieve clear shared meaning in our communication with others. Every day and all the time. There are no surefire methods or guaranteed outcomes.
8 - To understand just how difficult this task is for us, consider that scholar N.J. Enfield found that dealing with problems of conversation, i.e. repairing mechanical issues & meaning, occurs about every 84 seconds. Basically, all the time. (basicbooks.com/titles/n-j-enf…)
9 - We humans, with thousands of years of evolved language mastery, have to interrupt ourselves at least 3 times for every 5 minutes of conversation to recalibrate or even restart our quest for meaning. Even more depending on factors like noise and unshared context.
10 - We perform conversational repair so often we're barely aware of it. We tend to notice only when it is harder than usual, breaks down completely, or is absent when needed. And even then, we deploy backup strategies that we have practiced for years and decades.
11 - Repair is crucial for meaning, and guess where we run into the need for it a lot more than we're used to these days? Voice-driven products. The repair function of conversation is rudimentary at best in speech tech, and addresses ONLY mechanical recognition issues.
12 - And even those fundamentals are not handled well. Start a conversation with an assistant with a simple why, where, or how, and you will get "Sorry, I don't know how to help with that" rather than a very basic "why what?" or "what are you looking for?".
13 - That is not just a helpful response. It's a conversation-killer. There's no reasonable guess, invitation to reframe, or request to expand or restate. Instead of offering a cooperative & assistive response, it puts all the work (and blame) back on us, halting the interaction.
14 - This is not behavior we tolerate well in the humans we talk with, unless they are children. However, when adults initiate speaking to children, we expect unusual behavior & employ a sizable toolkit of conversation strategies tailored to the inequality of context and ability.
15 - Sure, when speech tech works, it can help with things that no child could, and that is its saving grace. Meaning, though, that we are trading convenience for intelligence and ability. At least currently, as long as we can stand it.
16 - The frustration we feel is that for an assistant to truly feel helpful, all 3 have to be combined. We'd fire a human from an assistant role for lack of 2/3. Ideally, what we want is the three to combine into anticipation & extrapolation (Algo driven suggestions don't count.)
17 - Conversational repair is one way we ensure we get to clear shared meaning. Of course, expression of intent of supporting information are the primary ways. We talk about what it is we are talking about. Sometimes, this is very direct, other times not.
18 - Our conversations are driven by multiple strategies for a wide variety of situations, from transactional to relational to explorational.
19 - Nearly all successful conversations predicated on positive intent have a set of characteristics in common: appropriate quality, quantity, relevance, and manner. These are the Gricean maxims. (sas.upenn.edu/~haroldfs/drav…)
20 - The power and success we experience as conversationalists is directly related to our shared context and abilities used to accurately and quickly assess and respond to the bits of interlocution that happen when talking to another person. "Good talk!"
21 - We love it when our conversations feel like that. We know people we respect as communicators and not. Aside from differences in positive & negative intention, our enjoyment & respect is based on how well someone meets our expectations of handling those contexts & abilities.
22 - When it comes to interacting with voice products, we lower our expectations severely. I sometimes say the assistants are 4 year-olds with internet access. But that view denigrates toddlers, who are busy acquiring multiple skills at astonishing rates of development.
23 - Furthermore, from #14 above, for children we have fairly dependable strategies that don't work with digital assistants. We can't teach or guide them in real time. We can't ask questions to extract meaning or clarify requests as needed.
24 - When assistants we often feel we have no options except to repeat ourselves, try random commands, or give up. For practiced conversationalists, it's an intensely disappointing, frustrating, and even maddening situation. We hate it. We feel ineffectual and without recourse.
25 - Sure, Alexa will play Taylor Swift on demand, Google Assistant will remind us to pick up groceries, and Siri will tell us the airspeed of an unladen swallow, but these are trivialities and novelties that fill little of our conversational lives.
26 - Our conversations cover the depths of relationships & the breadth of international relations. We discuss intricacies of daily routine & pleasures of activity & entertainment. We plan. We negotiate. We wax poetic. The ability to use language is a powerful and magical skill.
27 - Reducing the idea of conversation to simplistic sentences about pop songs and shopping lists is not progress. It doesn't serve or build upon the magnificent power we wield using spoken language. It is not humanistic.
28 - Is obeying commands about music & milk useful? Sure. I use my voice-driven products. I also get unhealthily & unnecessarily frustrated by them. In addition to the above, much of my frustration comes from knowing what could be vs. what the product development priorities are.
29 - The level of conversation in Alexa, Assistant, and Siri could be better. Much, much better. Now. Today. The fact that it isn't is due to goals and choices not constraints. Goals and choices that are not about replicating conversation.
30 - To be clear, there are well-designed voice products and interfaces out there. A good voice interface does not have to be truly conversational. It will follow some of the basic rules, yet bend or break others for clarity due to the context of interaction.
31 - This is fine. We simply need to recognize that designing a good voice interface is not == designing a conversational experience.
32 - What would good conversation look like with a digital assistant? Well, that's a different thread for another week. Today, I just wanted to shed light on why what we have right now is not conversational, and likely won't be for a very, very long time.
End - I hope this was helpful. If you are interested in learning more, I've collected a few resources here: blog.pulselabs.ai/2018/04/28/con… medium.com/people-talk/10… #voice #design
Fix 13-That's just not a helpful response. It's a conversation-killer. There's no reasonable guess, invitation to reframe, or request to expand / restate. Instead of offering a cooperative & assistive response, it puts all the work (and blame) back on us, halting the interaction.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Phillip Hunter
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!