Preserving an endangered language takes painstaking work.
Linguists must interview native speakers, often with a translator, to slowly reveal the inner workings of the language. They record thousands of words and their meanings, figure out sentence structures and document how the language is written, if at all.
Because of this lengthy process, we’re rapidly losing the fight to save many of the world’s languages. Experts say a fifth of humanity’s languages have fewer than 1,000 speakers, with one language going extinct about every two weeks.
Some might argue it’s a natural occurrence — survival of the fittest. But to Prim Phoolsombat, a junior majoring in computational linguistics at the USC Dornsife College of Letters, Arts and Sciences, losing languages means losing culture, values and community.
“Language is the fabric of civilization,” she said. “It’s the fabric of family.”
Phoolsombat and a group of fellow USC students are doing their part, fighting to preserve an endangered language in Italy. And their work is part of a revolution in how languages are studied — and protected.
During a summer research trip in 2018 led by USC linguistics expert Khalil Iskarous, the students recorded people speaking Ladin, a language used by only a few hundred people in Italy’s Fassa Valley. Their goal is to build a computer program that can analyze their audio data and create the outlines of Ladin’s basic rules, like how words and sentences are formed and what they sound like when spoken aloud.
“It doesn’t have to be perfect,” Phoolsombat said. “But if we can create a model that does this with relative success and we apply it to different endangered languages around the world, it will speed up the process greatly.”
USC student explores new ways to save endangered languages
Growing up in Silver Spring, Maryland, speaking English and Thai, Phoolsombat enjoyed noticing differences between the languages. English had a utilitarian feel to her; it got to the point. In contrast, Thai speakers seemed to strive for eloquence and beauty as they spoke, even if they tended to ramble on.
It fascinated Phoolsombat, but she never imagined that exploring language could become her career.
Then she enrolled at USC and took her first linguistics class: Human Language and Technology, taught by Mary Byram Washburn. It opened her eyes to the possibilities of using high-tech approaches to understand language, with clear applications to modern life — think Siri or Alexa. Beyond that, she discovered the field could use breakthroughs in computing to help preserve and revitalize endangered languages.
Her work with Iskarous and other USC students in Italy as part of USC Dornsife’s Problems Without Passports program is the first step in that process. They envision building a computer model that could protect the unique culture of the Fassa Valley and many other regions across the globe.
“The ultimate goal is to create a software program that given a certain amount of audio speech data would be able to document a language by figuring out pretty much everything about it,” she said, rattling off a list of elements like vocabulary, grammar and syntax.
USC summer research focuses on preserving endangered language
Tucked away in the Dolomites, a scenic range in the Italian Alps, the Fassa Valley has been relatively isolated throughout history. When outsiders did find their way to the area from Bavaria or other parts of Europe, they left their mark on Ladin.
“It’s like opening a time capsule and being able to see all these different influences,” Phoolsombat said. “When you listen to the language, it’s so strange, because it sounds like someone is speaking German, Spanish, Italian — basically all the neighboring languages — at the same time.”
Many young people in the region speak multiple languages, including English, and they helped the USC team interview Ladin speakers. The students recorded residents reading folk tales in Ladin and discussing topics ranging from sports to local attire. After collecting several hours of audio, they translated it into Italian, English and the International Phonetic Alphabet, a standardized system for depicting the sounds of speech.
Since returning from Italy, Phoolsombat has collaborated with Iskarous and several other students to tackle the problem of teaching a computer to identify different parts of speech like verbs, nouns and adjectives.
She also received support through USC Dornsife’s Summer Undergraduate Research Fund this year to examine Ladin’s morphology, or building blocks. The researchers are trying to figure out how morphemes — components like “ing” or “un” in English — come together to form words.
“The very nice thing about this work is that students are learning about mathematical and linguistics techniques that are very intricate, but they’re also helping preserve human culture,” Iskarous said. “These things have a real human dimension that is very profound.”
Technology plays growing role in linguistics, and vice versa
Iskarous has been taking USC students overseas to study endangered languages for nearly a decade. During previous trips to Taiwan, they explored ancient languages in the Austronesian family, spoken by islanders in areas ranging from Madagascar to Easter Island. He brought students to Italy’s Fassa Valley for first time last year, and a new group made the trip this summer.
This year, the students developed a mobile app that features illustrated fairytales and lullabies in Ladin. They plan to create video games to help young children and teens connect with their elders, gain interest in learning Ladin and help preserve their lifestyle and traditions.
Iskarous has seen how technology is drawing more students with diverse interests to the study of language. Experts in international relations, computer science, design and other fields are increasingly involved in language documentation.
“These used to be entirely academic problems, but because tech companies and machine learning are now involved, things are going much faster,” he said. Major research labs at places like Microsoft, Google and Facebook are exploring language processing and hiring many linguistics graduates as a result.
Phoolsombat admits she leans more toward arts than engineering or math, although she has found satisfaction in trying to work through the complexities of computer science and programming. She’s not ruling out a career in the tech world, but for now she’s enjoying the freedom to explore her different interests.
“Since coming to USC, the opportunities I’ve had have taken me in such wildly different directions that I couldn’t possibly have imagined,” she said. “So it doesn’t matter what I think my vision for the future is now, because it will inevitably be something completely different. As long as I’m happy and working on something I care about, that’s all that matters.”
Bilingual childhood sparked USC student’s interest in language
Phoolsombat grew up as the only child of Thai immigrants. As a bilingual speaker from a young age, she remembers being struck by how cultural differences could be reflected in language.
One moment stands out as particularly memorable. She had tiptoed into her parents’ room late at night and stole a blanket from their bed because she was chilly. When her parents awoke and discovered her misdeed, her dad chided her gently, saying she lacked “nam jai.”
From an immigrant perspective, I see the value of language. It’s the ultimate anthropological record, and it lasts forever if you can document it.
The Thai term translates directly to English as “water heart,” but to the young Phoolsombat, it carried a much deeper meaning that she struggles to put into words.
“It’s like 10 times more powerful than empathy,” she said. “When my dad said I lacked nam jai, I was like [gasp], ‘I’m such a terrible child, I made my parents cold!’ It’s a word that means to have such immense consideration and empathy for another person.”
Those are the nuances of language she hopes to protect through her work with Iskarous. And she’s hopeful that others will be equally intrigued by the intricacies of the spoken word and will join the fight to save the world’s endangered languages.
“From an immigrant perspective, I see the value of language,” she said. “It’s the ultimate anthropological record, and it lasts forever if you can document it.”