Aftab Hussain
University of Houston

Safe and Explainable AI for Code

Aftab Hussain1, Md Rafiqul Islam Rabin1, Mohammad Amin Alipour1, Vincent J. Hellendoorn2, Bowen Xu3, Omprakash Gnawali1, Sen Lin1, Toufique Ahmed4, Premkumar Devanbu4, Navid Ayoobi1, David Lo5, Sahil Suneja6
University of Houston1, Carnegie Mellon University2, North Carolina State University3, University of California, Davis4, Singapore Management University5, IBM Research6

account_balance Supported by SRI International, IARPA
event Accepted at AIware '24 at FSE '24, Porto de Galinhas, Brazil, SeT LLM at ICLR '24, Vienna, Austria, InteNSE '24 at ICSE '24, Melbourne, Australia, IST '23
calendar_clock 2021 to present

construction Skills used: Python, Pytorch, SciPy, Matplotlib, NumPy, C, Java, SQL, model finetuning, freezed model finetuning, model parameter analysis, data extraction, data manipulation, machine learning, cybersecurity

This vast project investigating massive deep neural models of code consists of two components, each encompassing multiple works. The Explainable AI component focuses on the behavior of these models, and the Safe AI for Code component focuses on their security. A dedicated page for the Safe AI for Code component can be found here. The subject models range in size from millions to billions of parameters (100m to 15b+) – they include transformer-based Large Language Models (LLMs) like Microsoft’s CodeBERT, Salesforce’s CodeT5 and CodeT5+, Meta’s Llama2 and CodeLlama, BigCode’s StarCoder, against attacks on software development tasks including defect detection, clone detection, and text-to-code generation. The techniques we deploy include model probing and black box approaches that involve fine-tuning the models on noise-induced and poisoned code data derived from benchmark datasets like Microsoft’s CodeXGLUE, utilizing NVIDIA A100 GPUs.

Here are the works in this project:

Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code

AIware'24: 1st ACM International Conference on AI-powered Software, co-located with the ACM International Conference on the Foundations of Software Engineering (FSE), 2024, Porto de Galinhas, Brazil
label Safe AI for Code

On Trojan Signatures in Large Language Models of Code

International Conference on Learning Representations Workshop on Secure and Trustworthy Large Language Models (SeT LLM at ICLR '24), 2024, Vienna, Austria
label Safe AI for Code

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne, Australia
label Explainable AI for Code

Memorization and Generalization in Neural Code Intelligence Models

Journal of Information and Software Technology, 2023
label Explainable AI for Code

Image by wirestock on Freepik