Once upon a time,
There was a big data analyst who saw a special something
In the pattern of a beloved story foretold
And with his brains and valued open-source programming
Set off on a quest to uncover this bold
Shift from fairy tale to a bigger reality
Of a children’s story gone off rails.
Sept. 22 marks an important date on the Shire calendar. It is the birthday of the hobbits Bilbo and Frodo Baggins, two fictional characters in J. R. R. Tolkien’s popular books The Hobbit and The Lord of the Rings. In the books, Bilbo and Frodo were born on Sept. 22 — Bilbo in the year 2890 and Frodo in 2968 of the Third Age. Of course, that’s 1290 and 1368, respectively, in Shire-reckoning for die-hard Tolkien fans.
This Hobbit Day 2016, a PhD candidate at the USC Viterbi School of Engineering who specializes in machine learning and hobbit lore decided to give the halflings a special birthday gift: a computer that could read and analyze their books with the power of big data.
When one says big data, it’s tempting to imagine a wealth of information stored within a supercomputer — too overwhelming to understand, let alone sift through — and the sort of thing presumed to be reserved for corporations of the same monstrous size and power. But the reality is big data isn’t necessarily such a behemoth after all. In fact, the study could be attributed to a single book, as Dave Kale of USC’s Department of Computer Science set out to prove in his spare time.
Kale’s main research, conducted under Greg Ver Steeg of the Information Sciences Institute, aims to use machine learning to extract insight from massive digital data in health care. He’s developing deep-learning solutions for precision diagnostic technologies that can scan a patient’s health data and immediately come up with a diagnosis. Imagine a portable, wireless device in the palm of your hand that monitors and diagnoses your health conditions anytime, anywhere. His research is shaping this emerging technology.
As co-founder of the annual Meaningful Use of Complex Medical Data Symposium, Kale is not only an expert in his field, but also in the breadth of anything and everything penned under J. R. R. Tolkien.
For years, he has hosted forums and podcasts, such as the popular all-things-Tolkien show “Riddles in the Dark,” diving into the significance of his high fantasy storytelling style one scene at a time. In 2015, he decided it was time to introduce another player to his own journey through Tolkien’s novels: a computer.
When it comes to fantasy, Tolkien is king.
“It seemed like a good blending of my interests.” Kale said. “My area of research is machine learning and artificial intelligence, but one of my main avocations is fantasy and sci-fi. When it comes to fantasy, Tolkien is king.”
A bag of words
Kale’s passion translated into building a community of people who love the creator of Middle-earth. With a little curiosity and motivation, he set out to bring something new to that community, one project to rule them all — his computer would analyze The Hobbit page by page.
For several days, Kale fed a digital copy of the book through his algorithm, which he trained to fragment each page into a “bag of words.” The algorithm then scanned the “bag” for repeated words or phrases that, when grouped together, formed themes and topics. These were then edited and categorized to form the basis for a literary analysis.
What happened next surprised Kale. The computer started taking its job seriously. Kale didn’t intend to discover that his algorithm could identify chapter breaks and contextual evidence that showed how Tolkien deliberately structured plot progression. The computer’s main task was to look for something more complex: tone progression.
“The Hobbit starts as a children’s fairytale,” Kale said. “It’s very silly, but by the time they get to the end, what you have is a gritty war story where a lot of the main characters are dead and there’s a lot of fighting. It’s quite tragic. There’s this really interesting shift that goes gradually through the book, but it feels like it comes on suddenly. I figured if I threw some math at the book, it will find this.”
More than meets the eye
Although the computer did not immediately give Kale the answers to his burning questions about tonal shift, it had defined plot progression and chapter structure without being told either of those existed within the text. It even identified the main, recurring plot line within the entire novel, which kindled Kale’s thoughts: “What else could this program find?” As Gandalf once said, “I think there’s more to this Hobbit than meets the eye.”
“That’s the exciting thing,” Kale said. “What could a serious student of literature do with these types of tools?”
Kale’s approach to the novel use of digital humanities reveals just how diverse the use of analytics truly is and what future of understanding it can pave for us. A researcher or a mere enthusiast can teach the algorithm to search for even deeper, more complex literary analysis.
In Kale’s mind, Big Data is not a monster. It is not some futuristic mumbo-jumbo that endangers the livelihoods of every working man and woman. In fact, it won’t replace experts in general.
“It’s a tool — a lens to study an artifact,” Kale said.
This tool will allow us to look deeper into a field of expertise than we have ever gone before. And that, in itself, is magical.