What Siri could learn from us

What Siri could learn from us

Cognitive science cannot yet explain how we are able to understand speech under a variety of conditions.

The acoustics of a spoken message depend on the speaker, their dialect, the rate of speech and background noise, among other things, according to Navin Viswanathan, associate professor of Speech-Language-Hearing.

Despite this variability, human listeners reliably perceive speech seemingly effortlessly especially compared to contemporary speech recognition systems, he said.

Take Apple’s Siri: It often seems that a user of the system gets trained to use Siri, instead of the other way around, said Viswanathan.

Viswanathan and his collaborators, Laura Dilley, assistant professor of communicative sciences and disorders at Michigan State University and Lisa Sanders, associate professor of psychological and brain sciences at the University of Massachusetts, Amherst, are bringing their combined expertise in cognitive psychology and cognitive neuroscience to try and answer the question of speech variability. Their project is tantalizingly titled, Making Words Disappear or Appear: A Neurocognitive and Behavioral Investigation of Effects of Speech Rate on Spoken Word Recognition. The grant is supported by the National Science Foundation.

For example, in the sentence, “Deana doesn’t have any leisure time,” their past research found that by changing the rate of speech that comes before “leisure time,” listeners hear either “leisure time” or “leisure or time.”

Viswanathan said this offers clues on how we perceive speech.

“If we do similar things to the context part of the sentence, can we change both the number of words that are perceived as well as the phonetic properties of those words? That’s what we’re trying to figure out.”

There are more and more voice-activated systems that interact using human speech, but how well they solve the variability problem is going to be increasingly crucial, according to Viswanathan. Systems like caregiving robots for seniors that fulfill social functions will need to speak to and understand humans, including those with motor speech disorders caused by stroke or diseases such as ALS or Parkinson’s disease.

Beyond technological applications, there are clear clinical applications to understand how it is that we perceive speech. “It seems relatively effortless to understand speech, but it is not for everybody, and when things go wrong, if we understand how the basic mechanism that supports the perception of speech works, we can try to fashion interventions appropriately.”