Path-Based Function Embedding and its Application to Specification Mining

D. DeFreez, A. V. Thakur, C. Rubio-González. ICSE 2018

     

Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that perform initialization for different device drivers, or functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neu- ral network on sentences generated from random walks over an encoding of the program as a labeled pushdown system (ℓ-PDS). We demonstrate that func2vec is effective at identifying function synonyms in the Linux kernel. Furthermore, we show how function synonyms enable mining error-handling specifications with high support in Linux file systems and drivers.