Code-Generating Models

Code-generating models define a probability distribution over code by stochastically mod- eling the generation of smaller and simpler parts of code, e.g. tokens or AST nodes.
NameTypeRepresentationModelApplication
K. Aggarwal, M. Salameh, and A. Hindle, 2015. Using Machine Translation for Converting Python 2 to Python 3 Code Transducer Token Phrase Migration
M. Allamanis, C. Sutton, 2013. Mining Source Code Repositories at Massive Scale Using Language Modeling Language Model Token n-gram Idiom Mining
M. Allamanis, E. T. Barr, C. Bird, C. Sutton, 2014. Learning Natural Coding Conventions Language Model Token + Location n-gram Coding Conventions
M. Allamanis, C. Sutton, 2014. Mining Idioms from Source Code Language Model Syntax Grammar (pTSG) ---
M. Allamanis, D. Tarlow, A. D. Gordon, Y. Wei, 2015. A Bimodal Modelling of Source Code and Natural Language Multimodal Syntax Grammar (NN-LBL) Code Search/Synthesis
M. Amodio, S. Chaudhuri, T. Reps, 2017. Neural Attribute Machines for Program Generation Language Model Syntax+Constraints RNN ---
A.V.M. Barone, R. Sennrich, 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation Multimodal Token Neural MT Documentation
T. Beltramelli, 2017. pix2code: Generating Code from a Graphical User Interface Screenshot Multimodal Token NN (Encoder-Decoder) GUI Code Synthesis
S. Bhatia, R. Singh, 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks Language Model Token RNN (LSTM) Syntax Error Correction
A. Bhoopchand, T. Rocktäschel, E.T. Barr, S. Riedel, 2016. Learning Python Code Suggestion with a Sparse Pointer Network Language Model Token NN (Pointer Net) Code Completion
P. Bielik, V. Raychev, M. Vechev, 2016. PHOG: Probabilistic Model for Code Language Model Syntax PCFG + annotations Code Completion
J. C. Campbell, A. Hindle, J. N. Amaral, 2014. Syntax Errors Just Aren’t Natural: Improving Error Reporting with Language Models Language Model Token n-gram Syntax Error Detection
L. Cerulo, M. Di Penta, A. Bacchelli, M, Ceccarelli, G. Canfora, 2015. Irish: A Hidden Markov Model to detect coded information islands in free text Language Model Token Graphical Model (HMM) Information Extraction
C. Cummin, P. Petoumenos, Z. Wang, H. Leather, 2017. Synthesizing benchmarks for predictive modeling Language Model Character NN (LSTM) Benchmark Synthesis
H. K. Dam, T. Tran, T. Pham, 2016. A deep language model for software code Language Model Token NN (LSTM) ---
S. Gulwani, M. Marron, 2014. NLyze: Interactive Programming by Natural Language for SpreadSheet Data Analysis and Manipulation Multimodal Syntax Phrase Model Text-to-Code
T. Gvero, V. Kuncak, 2015. Synthesizing Java expressions from free-form queries Language Model Syntax PCFG + Search Code Synthesis
V.J. Hellendoorn, P. Devanbu, A. Bacchelli, 2015. Will they like this? Evaluating Code Contributions With Language Models Language Model Token n-gram Code Review
V. J. Hellendoorn, P. Devanbu, 2017. Are Deep Neural Networks the Best Choice for Modeling Source Code? Language Model token n-gram (cache) --
A. Hindle, E. T. Barr, Z. Su, M. Gabel, P. Devanbu, 2012. On the Naturalness of Software Language Model Token n-gram Code Completion
C. Hsiao, M. Cafarella, S. Narayanasamy, 2014. Using Web Corpus Statistics for Program Analysis Language Model PDG n-gram Program Analysis
S. Karaivanov, V. Raychev, M. Vechev, 2014. Phrase-Based Statistical Translation of Programming Languages Transducer Token Phrase Migration
A. Karpathy, J. Johnson, L. Fei-Fei, 2015. Visualizing and Understanding Recurrent Networks Language Model Characters RNN (LSTM) ---
N. Kushman, R. Barzilay, 2013. Using Semantic Unification to Generate Regular Expressions from Natural Language Multimodal Token Grammar (CCG) Code Synthesis
X.V. Lin, C. Wang, D. Pang, K. Vu, L. Zettlemoyer, M.D. Ernst, 2017. Program Synthesis from Natural Language Using Recurrent Neural Networks Multimodal Tokens NN (Seq2seq) Synthesis
W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, P. Blunsom, 2016. Latent Predictor Networks for Code Generation Multimodal Token RNN + Attention Code Synthesis
H. Liu, 2016. Towards Better Program Obfuscation: Optimization via Language Models Language Model Token n-gram Obfuscation
C.J. Maddison, D. Tarlow, 2014. Structured Generative Models of Natural Source Code Language Model Syntax with scope NN ---
A. K. Menon, O. Tamuz, S. Gulwani, B. Lampson, A.T. Kalai, 2013. A Machine Learning Framework for Programming by Example Multimodal Syntax PCFG + annotations Code Synthesis
A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, 2013. Lexical Statistical Machine Translation for Language Migration Transducer Token Phrase Migration
T.T. Nguyen, A.T. Nguyen, H.A. Nguyen, T.N. Nguyen, 2013. A Statistical Semantic Language Model for Source Code Language Model Token + parse info n-gram Code Completion
A.T. Nguyen, T.T. Nguyen, T.N. Nguyen, 2014. Divide-and-Conquer Approach for Multi-phase Statistical Migration for Source Code Transducer Token + parse info Phrase SMT Migration
A.T. Nguyen, T.N. Nguyen, 2015. Graph-based Statistical Language Model for Code Language Model Partial PDG n-gram Code Completion
Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, and S. Nakamura, 2015. Learning to Generate Pseudo-code from Source Code using Statistical Machine Translation Transducer Syntax + Token Tree-to-String + Phrase Pseudocode Generation
J. Patra, M. Pradel, 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data Language Model Syntax Annotated PCFG Fuzz Testing
H.V. Pham, T.T. Nguyen, P.M. Vu, T.T. Nguyen, 2016. Learning API usages from bytecode: a statistical approach. Language Model Bytecode Graphical Model (HMM) Code Completion
Y. Pu, K. Narasimhan, A. Solar-Lezama, R. Barzilay, 2016. sk_p: a neural program corrector for MOOCs Transducer Token NN (Seq2seq) Code Fixing
M. Rabinovich, M. Stern, D. Klein, 2017. Abstract Syntax Networks for Code Generation and Semantic Parsing Multimodal Syntax NN (LSTM-based) Code Synthesis
B. Ray, V. Hellendoorn, S. Godhane, Z. Tu, A. Bacchelli, P. Devanbu, 2015. On the “Naturalness” of Buggy Code Language Model Token n-gram (cache) Bug Detection
V. Raychev, M. Vechev, E. Yahav, 2014. Code Completion with Statistical Language Models Language Model Token + Constraints n-gram/ RNN Code Completion
V. Raychev, P. Bielik, M. Vechev, A. Krause, 2016. Learning Programs from Noisy Data Language Model Syntax PCFG + annotations Code Completion
C. Saraiva, C. Bird, T. Zimmermann, 2015. Products, Developers, and Milestones: How Should I Build My N-Gram Language Model Language Model Token n-gram ---
A. Sharma, Y. Tian, D. Lo, 2015. NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model Language Model Token n-gram Information Extraction
Z. Tu, Z. Su, P. Devanbu, 2014. On the Localness of Software Language Model Token n-gram (cache) Code Completion
B. Vasilescu, C. Casalnuovo, P. Devanbu, 2017. Recovering Clear, Natural Identifiers from Obfuscated JS Names Transducer Token Deobfuscation
C. Liu, X. Wang, R. Shin, J.E. Gonzalez, D. Song, 2016. Neural Code Completion Language Model Syntax NN (LSTM) Code Completion
M. White, C. Vendome, M. Linares-Vásquez, D. Poshyvanyk, 2015. Toward Deep Learning Software Repositories Language Model Token NN (RNN) ---
S. Yadid, E. Yahav, 2016. Extracting Code from Programming Tutorial Videos Language Model Token n-gram Information Extraction
P. Yin, G. Neubig, 2017. A Syntactic Neural Model for General-Purpose Code Generation Multimodal Syntax NN (Seq2seq) Synthesis