Memorization and Generalization in Neural Code Intelligence Models
Md Rafiqul Islam Rabin, Aftab Hussain, Vincent J. Hellendoorn, Mohammad Amin Alipour
University of Houston
Carnegie Mellon University
article Published at The Journal of Information and Software Technology, 2023
arrow_backReturn to Safe and Explainable AI Projects
University of Houston
Carnegie Mellon University
article Published at The Journal of Information and Software Technology, 2023
arrow_backReturn to Safe and Explainable AI Projects
Are deeplearning models memorizing your data instead of learning patterns? Recent works suggest memorization risk is high with noisy training data. In this exploration, we see the extent of memorization in neural code intelligence models (models that can automatically do certain coding tasks), and provide insights on how memorization may impact the learning behavior of such models.
-
Evaluated the impact of adding training noise on the accuracy of Google’s GREAT family of code models for the variable- misuse task.
-
Systematically induced input and output variable noise to the Py150 ETH VarMisuse dataset.
article Paper - IST 2023
article Paper - arXiv 2022
code_blocks Source code - Github (Noise Inducer)
code_blocks Source code - Github (Custom Finetuning Stats Generation)
article Paper - arXiv 2022
code_blocks Source code - Github (Noise Inducer)
code_blocks Source code - Github (Custom Finetuning Stats Generation)