Representational Models

Representational models take an abstract representation of code as input. Example representations include token contexts or data flow. The resulting model yields a conditional probability distribution over code element properties, like the types of variables, and can predict them.
NameInput Code RepresentationTargetIntermediate RepresentationApplication
M. Allamanis, D. Tarlow, A. D. Gordon, Y. Wei, 2015. A Bimodal Modelling of Source Code and Natural Language Natural language Language Model Distributed Code Search/Synthesis
M. Allamanis, E. T. Barr, C. Bird, C. Sutton, 2015. Suggesting Accurate Method and Class Names Token Context Identifier Name Distributed Naming
M. Allamanis, H. Peng, C. Sutton, 2016. A Convolutional Attention Network for Extreme Summarization of Source Code Tokens Method Name Distributed Naming
M. Allamanis, M. Brockscmidt, 2017. SmartPaste: Learning to Adapt Source Code Data Flow Variable Allocation Distributed Contextualization
B. Bichsel, V. Raychev, P. Tsankov, M. Vechev, 2016. Statistical Deobfuscation of Android Applications Dependency Net Identifier Name CRF (GM) Deobfuscation
M. Bruch, M. Monperrus, and M. Mezini, 2009. Learning from Examples to Improve Code Completion Systems Partial Object Use Invoked Method Localized Code Completion
K. Chae, H. Oh, K. Heo, H. Yang, 2016. Automatically generating features for learning program analysis heuristics Data Flow Graph Static Analysis Localized Program Analysis
C.S. Corley, K. Damevski, N.A. Kraft, 2015. Exploring the Use of Deep Learning for Feature Location Tokens Feature Location Distributed Feature Location
C. Cummins, P. Petoumenos, Z. Wang, H. Leather, 2017. End-to-end Deep Learning of Optimization Heuristics Tokens Optimization Flags Distributed Optimization Heuristics
H. K. Dam, T. Tran, T. Pham, 2016. A deep language model for software code Token Context LM (Tokens) Distributed ---
X. Gu, H. Zhang, D. Zhang, S. Kim, 2016. Deep API Learning Natural Language API Calls Distributed API Search
J. Guo, J. Cheng, J. Cleland-Huang, 2017. Semantically enhanced software traceability using deep learning techniques Tokens Traceability link Distributed Traceability
R. Gupta, S. Pal, A. Kanade, S. Shevade, 2017. DeepFix: Fixing Common C Language Errors by Deep Learning Tokens Code Fix Distributed Code Fixing
X. Hu, Y. Wei, G. Li, Z. Jin, 2017. CodeSum: Translate Program Language to Natural Language Linearized AST Natural Language Distributed Summarization
S. Iyer, I. Konstas, A. Cheung, L. Zettlemoyer, 2016. Summarizing Source Code using a Neural Attention Model Tokens Natural Language Distributed Summarization
S. Jiang, A. Armaly, C. McMillan, 2017. Automatically Generating Commit Messages from Diffs using Neural Machine Translation Tokens (Diff) Natural Language Distributed Commit Message
U. Koc, P. Saadatpanah, J. S. Foster, A. A. Porter, 2017. Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools Bytecode False Positives Distributed Program Analysis
T. Kremenek, A.Y. Ng, D. Engler, 2007. A Factor Graph Model for Software Bug Finding Partial PDG Ownership Factor (GM) Pointer Ownership
D. Levy, L. Wolf, 2017. Learning to Align the Source Code to the Compiled Object Code Statements Alignment Distributed Decompiling
Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel, 2015. Gated Graph Sequence Neural Networks Memory Heap Separation Logic Distributed Verification
P. Loyola, E. Marrese-Taylor, Y. Matsuo, 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes Tokens (Diff) Natural Language Distributed Explain code changes
C.J. Maddison, D. Tarlow, 2014. Structured Generative Models of Natural Source Code LM AST Context Language Model Distributed ---
R. Mangal, X. Zhang, A. V. Nori, M. Naik, 2015. A User-Guided Approach to Program Analysis Logic + Feedback Prob. Analysis MaxSAT Program Analysis
L. Mou, G. Li, L. Zhang, T. Wang, Z. Jin, 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing Syntax Classification Distributed Task Classification
D. Movshovitz-Attias, W.W. Cohen, 2013. Natural Language Models for Predicting Programming Comments Tokens Code Comments Directed GM Comment Prediction
T.D. Nguyen, A.T. Nguyen, T.N. Nguyen, 2016. Mapping API Elements for Code Migration with Vector Representations API Calls API Calls Distributed Migration
H. Oh, H. Yang, K, Yi, 2015. Learning a Strategy for Adapting a Program Analysis via Bayesian Optimisation Features Analysis Params Static Analysis Program Analysis
C. Omar, 2013. Structured Statistical Syntax Tree Prediction Syntactic Context Expressions Directed GM Code Completion
C. Piech, J. Huang, A. Nguyen, M. Phulsuksombati, M, Sahami, L. Guibas, 2015. Learning Program Embeddings to Propagate Feedback on Student Code Syntax + State Student Feedback Distributed Student Feedback
S. Proksch, J. Lerch, M. Mezini, 2015. Intelligent Code Completion with Bayesian Networks Inc. Object Usage Object Usage Directed GM Code Completion
M. Rabinovich, M. Stern, D. Klein, 2017. Abstract Syntax Networks for Code Generation and Semantic Parsing LM AST context LM (Syntax) Distributed Code Synthesis
V. Raychev, M. Vechev, A. Krause, 2015. Predicting Program Properties from “Big Code” Dependency Net Types + Names CRF (GM) Types + Names
S. Wang, D. Chollak, D. Movshovitz-Attias, L. Tan, 2016. Bugram: bug detection with n-gram language models Tokens Defects LM (\ngram) Bug Detection
M. White, C. Vendome, M. Linares-Vásquez, D. Poshyvanyk, 2015. Toward Deep Learning Software Repositories Tokens Language Model Distributed ---
M. White, M. Tufano, C. Vendome, D. Poshyvanyk, 2016. Deep Learning Code Fragments for Code Clone Detection Token + AST Distributed Clone Detection
W. Zaremba, I. Sutskever, 2014. Learning to Execute Characters Execution Trace Distributed ---