People find it easy to pick up on verbal clues that someone is drunk – his or her speech is slowed, slurred, stumbling and often louder than usual. But can a computer learn to do the same?
Machine analysis of speech to detect intoxication was an international challenge issued by the International Speech Communication Association (ISCA) for its annual conference, Interspeech 2011, held this year in Florence, Italy.
A team from the USC Viterbi School of Engineering’s Signal Analysis and Interpretation Laboratory (SAIL) took first place, beating 10 teams from around the world.
The raw material was 39 hours of recorded utterances from 154 German volunteers – 77 men and 77 women ranging in age from 21 to 75 – first interviewed with high blood alcohol levels, then two weeks later when sober. Researchers looked for generic similarities in drunken versus sober speech to develop a test that would help distinguish the two.
After spending two months developing the tests, the teams then applied the tests to another set of utterances to determine their speakers’ states of sobriety.
USC’s six-person team, led by professor Shrikanth Narayanan, came out on top, marking the second SAIL team to do so in the three-year history of Interspeech competitions. A previous SAIL team, which focused on determining emotion from speech samples, won the 2009 contest.
The speech samples were a mixed bag – some spontaneous speech, some readings of text material. Although all speech samples were in German, it wasn’t problematic for the USC team, said Matthew Black, an electrical engineering doctoral candidate, who co-authored the team’s paper “Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors.” The techniques, he said, are language independent.
“If all the participants were saying the same thing,” Black said, “it might have been easier.” Still, researchers had both sober and drunken speech from the same person to analyze.
Drawing from work previously done by other speech analysts, the SAIL group’s approach fused a group of computer methods for analyzing speech into a multimodal system. The modes included spectral cues long used for speech recognition, such as prosody, rhythm, intonation and pitch, as well as voice-quality cues, such as hoarseness, creakiness, breathiness, nasality and quiver. Ordinary computers were used for the computations.
“This winning approach relied on hierarchical organization of speech signal features, the use of novel speaker normalization techniques, as well as the fusion of multiple classifier subsystems,” Narayanan wrote.
The SAIL software accurately identified 70 percent of the unknown samples, the highest in the competition and significantly higher than the previous best rate of approximately 65 percent.
In addition to Narayanan and Black, the other team members included research professor Sungbok Lee and Ph.D. students Ming Li and Angeliki Metallinou. Doctoral student Daniel Bone was the lead author on the paper.
In the future, then, will police officers ask drivers stopped on suspicion of drunken driving to speak a few words into a microphone instead of walking a straight line? “Not right away,” Bone said, “but it is possible that in-car alcohol detection systems may incorporate speech-based technology in combination with other techniques.”
Details of this research and other ongoing SAIL efforts in human-centered signal processing and behavioral informatics can be found at sail.usc.edu/tmp
To read the prize-winning paper, visit bit.ly/IntoxicatedSpeech