Remo Rohs is looking for some deep connections: He’s integrating genomics and structural biology to uncover some significant insights into how proteins recognize DNA.
While genomics deciphers DNA by studying the sequences of base pairs that encode genetic information, structural biology explores the impact of the actual 3-D structure of DNA. Rohs, however, aims to unite the two fields into something new — and hopefully more useful.
“Structural biology and genomics are big fields, but there is little interaction between these two worlds,” said Rohs, assistant professor of biological sciences, chemistry and physics at the USC Dornsife College of Letters, Arts and Sciences. “Genomics thinks of entire genomes in terms of sequence, and structural biology thinks of 3-D structures at high resolution but limited size.”
Shapes and sequences
In a March 9 paper in the Proceedings of the National Academy of Sciences, a team led by Rohs, which included researchers from Duke and Columbia universities, used a large data set of proteins to show that combining information on DNA shape and sequence resulted in a better understanding of protein-DNA recognition.
“Transcription factors are proteins that bind DNA to regulate genes, so knowing how and where they bind is of central importance in biology. This paper describes how modeling DNA shape can improve our understanding of transcription factor binding, with broad implications for many areas of research,” said Steven Henikoff, a member of the National Academy of Sciences who edited Rohs’ paper for PNAS.
Rohs’ group used machine learning to train models that predict how well and where the transcription factors will bind to the genome. When thinking about how machine learning works, Rohs said, look no further than a search engine.
“When Google tries to understand your consumer behavior by looking at the websites you visit — that is a feature,” said Rohs, who holds a joint appointment in computer science at the USC Viterbi School of Engineering. “In the same way, we can use the features of the DNA — its sequence and shape — to predict whether a binding site is occupied by a protein or not.”
Tianyin Zhou ’14, a former graduate student in Rohs’ lab who earned a Ph.D. in computational biology and bioinformatics from USC Dornsife and is the lead author of the study, said that there are dual implications for the work.
“First, once we incorporate the DNA shape, we can get very good predictive models,” he said. “And with this information, we can tell how gene expression is regulated. Second, when you know a mechanism, you can design or engineer a sequence to make it bind to the protein you want,” said Zhou, who is now working as a software engineer at Google.
In another paper, published on April 2 in the journal Cell, Rohs collaborated with Richard Mann, an experimental biologist at Columbia, to tease apart the contributions of DNA shape and sequence. They took proteins that they knew require DNA shape for binding and mutated the amino acids that only recognize shape but not sequence.
The researchers looked at a group of proteins known as Hox transcription factors, which are critical for early embryonic development. Rohs and colleagues found that introducing shape-recognizing amino acids from one transcription factor to another swapped binding specificities between Hox proteins
Lin Yang, a doctoral student in computational biology and bioinformatics at USC Dornsife, said that understanding the fundamentals of binding specificities is a vital scientific goal.
When these proteins don’t work properly — when they bind arbitrarily or bind to incorrect sites — it might cause disease.
“When these proteins don’t work properly — when they bind arbitrarily or bind to incorrect sites — it might cause disease,” Yang said.
Rohs credits Yang for the acceptance of the paper in Cell because Yang successfully used machine learning to identify the DNA shape features that are important for recognition.
Ties that bind
A third paper, published on March 11 in Genome Research in collaboration with Eran Segal from the Weizmann Institute of Science, found that regions outside the binding site are important for binding.
In the future, Rohs hopes to continue his work on gene regulation in a more complex way.
“When we talk about protein binding to DNA, we assume DNA is accessible, but in the cell it is folded up and covered by other proteins,” he said. “So the next step is to integrate information about cooperative binding and the accessibility of binding sites, going from the in vitro to the more complex in vivo situation. This also includes epigenetic mechanisms such as DNA methylation, which is another interest of my team.”
Rohs joined USC Dornsife in 2010 and has since published 25 peer-reviewed articles. He largely credits his graduate students for the ability to do this work. Funding for his research comes from three National Institutes of Health grants, one National Science Foundation grant and a Sloan Research Fellowship.