Memorization and Generalization in Neural Code Intelligence Models

Md Rafiqul Islam Rabin, Aftab Hussain, Vincent J. Hellendoorn, Mohammad Amin Alipour
University of Houston
Carnegie Mellon University

article Published at The Journal of Information and Software Technology, 2023

arrow_backReturn to Safe and Explainable AI Projects

drawing

Are deeplearning models memorizing your data instead of learning patterns? Recent works suggest memorization risk is high with noisy training data. In this exploration, we see the extent of memorization in neural code intelligence models (models that can automatically do certain coding tasks), and provide insights on how memorization may impact the learning behavior of such models.

Evaluated the impact of adding training noise on the accuracy of Google’s GREAT family of code models for the variable- misuse task.
Systematically induced input and output variable noise to the Py150 ETH VarMisuse dataset.

article Paper - IST 2023
article Paper - arXiv 2022
code_blocks Source code - Github (Noise Inducer)
code_blocks Source code - Github (Custom Finetuning Stats Generation)

Image by svstudioart on Freepik

Aftab Hussain University of Houston

Memorization and Generalization in Neural Code Intelligence Models

Aftab Hussain
University of Houston