For many people, merely the idea of speaking in front of a crowd causes anxiety. But a new tool might alleviate that fear by allowing people to practice in front of a virtual audience — one that even provides feedback on how the speaker did.
The tool is Cicero, an interactive virtual audience solution being developed by researchers at the USC Institute for Creative Technologies and the USC Viterbi School of Engineering.
Named for the Roman rhetorician, Cicero combines machine-learning models and Toastmasters tips to automatically evaluate a person’s delivery and provide constructive critiques for improvement.
The goal of this project is to give people that feedback before it’s too late.
“We’ve all had the experience practicing a presentation in front of mirrors or empty chairs,” said Stefan Scherer, co-leader of the effort and research assistant professor at ICT and the USC Viterbi Department of Computer Science.
“But in order to get better, you need audience feedback, including non-verbal signals like nodding heads or downcast eyes that tell you if you are doing well or not. The goal of this project is to give people that feedback before it’s too late.”
Slump or soar?
To begin that process, Scherer and project co-leader Louis-Philippe Morency, director of ICT’s MultiComp Lab and a research assistant professor at USC Viterbi, made listed the characteristics experts have determined will put the members of an audience on the edge of their chairs and what will send them slumping in their seats.
Next, they brought in study subjects who gave speeches in front of a static virtual audience. Researchers recorded their performances, tracking components of the presenters’ speech, gaze and body movement, and monitoring more than 20 non-verbal characteristics associated with good or bad speaking performances.
The focus was on style, not substance. The team did not address the content of what people said — that might come later — but rather looked at and listened to how the speech was delivered. Were voices monotone or did inflections change? Did people speak in a breathy whisper or with a strong timbre? Did they make the most of the space on the stage, direct their eyes to specific people, wave their arms or clasp their hands?
“These are all measurable factors that go into determining whether a performance is effective or not,” Morency said. “People make these calculations automatically, and what we discovered is that computers can be taught to do the same.”
Qualities of effective speeches
In results presented at the International Conference on Intelligent Virtual Agents, the researchers reported that the initial Cicero prototype recognized properties of effective speeches — including strong voice quality, eye contact and gesturing — nearly as accurately as the trained Toastmasters who volunteered to appraise the talks.
The evaluative engine driving Cicero is MultiSense. Developed by ICT research programmer Giota Stratou, MultiSense can instantly quantify facial expressions, posture and speech patterns. The framework, combined with cameras, microphones and a Microsoft Kinect sensor, can automatically analyze people’s gestures, voices, eye contact and facial expressions to provide intelligent feedback that helps them improve their performances in public speaking.
With Cicero, the researchers’ next challenge is to combine MultiSense and SmartBody, a character-animation system overseen by ICT research scientist and Cicero co-investigator Ari Shapiro. SmartBody determines individualized feedback behaviors for each member of the virtual audience — behaviors driven by the practice performance and informed by learning strategies designed to help the speaker be more effective.
If a speaker avoids eye contact, we might have an audience member clear his or her throat to get the presenter’s attention.
“We are realizing we don’t need to model everything a real audience would do,” Morency said. “Rather than have a virtual listener quietly fall asleep, we might have them shift their body and cough to signal to the speaker that people seem bored. If a speaker avoids eye contact, we might have an audience member clear his or her throat to get the presenter’s attention.”
The team is conducting a study with 60 people who each give a presentation. Some people get no feedback, while others receive feedback in the form of green or red color bars that indicate levels of audience engagement. A third group gets its feedback from an interactive virtual audience. After receiving feedback (or not), each person presents again. The team hopes to determine which form of feedback is most effective.
Aside from helping to improve speechmakers’ skills, this phase of the Cicero study aims to advance ICT’s research on developing interactive virtual humans, including improving how those characters move, listen, react and perceive as they communicate with real people. The researchers are also using this project to better understand how to implement effective computer-delivered instruction, provide automatic assessments and model individualized behaviors for diverse groups of virtual humans.
ICT has long specialized in training systems to improve interpersonal skills. Cicero is sponsored in part by the U.S. Army to encourage the development of leaders who are confident speaking in front of a crowd; the National Science Foundation also provides funding.
The team sees other potential applications that can improve how people present themselves and inform future human-computer interaction research, like preparing politicians for news conferences or helping job candidates get ready for group interviews.
But, Morency and Scherer cautioned, practice — whether with a virtual audience or a real one — is not the only factor when it comes to delivering a crowd-pleasing presentation.
“People project more confidence when they are enthusiastic about the message they want to deliver,” Scherer said. “There may be people who have plenty of training, but they don’t believe in what they are saying.”
And that may be the most valuable feedback of all.