CSIS Seminar

Studying and Exploiting the "naturalness" of code

Speaker:   Prem Devanbu, Computer Science Department University of California at Davis
When:   April 23, 2018, 10:00 am - 11:00 am
Where:   Research Hall, Room 163

Abstract

While natural languages are rich in vocabulary and grammatical flexibility, most human are mundane and repetitive. This simple, repetitive structure has led to great advances in statistical NLP methods. At UC Davis, we have discovered that, despite the considerable power and flexibility of programming languages, large software corpora are actually even more repetitive, than natural language. We were the first to show that this “naturalness” of code could be captured in statistical models, and exploited for code prediction tasks and also for defect finding; we have also developed very fast, “lazy” language models that exploit the nested structure of code to yield entropy rates significantly below those achieved even by advanced deep learning models. Ongoing projects are exploring code-deobfuscation, and other applications. In this talk, I will present some large-scale empirical studies of this phenomenon, and some recent results.

Speaker Bio

Prem Devanbu is a Professor at UC Davis. He was formerly a Research staff member at Bell Labs. He received B. Tech from IIT Madras before you were born, and his PhD from Rutgers University in 1994. For the work described in this talk, he gratefully acknowledges support from NSF (#141472 CISE: LARGE: Collaborative grant "Exploiting the Naturalness of Software").