Research & Projects
We design tools and techniques that make software-intensive systems, including modern AI, easier to build, verify, and sustain. Much of our work advances dependable and trustworthy AI, organized into focused thrusts, alongside our work using AI and data science for software engineering.
Modular Deep Learning
Decomposing deep neural networks into modules that can be tested, reused, replaced, and evolved on their own.
Dependable & Trustworthy AIFault Localization for Deep Learning
Finding where deep learning models go wrong, and making them faster and cheaper to debug.
Dependable & Trustworthy AIDependable Data Science
Understanding and reducing risk across the entire data-science lifecycle through the D4 project.
Dependable & Trustworthy AIAI/Data Science for Software Engineering
Large language models, program analysis, and data science to localize, repair, and improve software at scale.
LLM-based analysis & repair · BoaModular Deep Learning
Modular Deep Learning studies a class of machine learning algorithms known as deep learning. A deep learning algorithm uses multiple layers of transformation functions to convert inputs to outputs, each layer learning higher-level abstractions in the data. Because the layers are organized as a network, such models are also called deep neural networks (DNN). Deep learning now appears in wide-ranging and safety-critical systems, from autonomous driving to medical analysis, which makes software engineering practices for deep learning an urgent need.
One challenge is to enable the reuse and replacement of the parts of a DNN, which has the potential to make DNN development more reliable. This project investigates a comprehensive approach to decompose deep neural networks into modules so those modules can be reused, replaced, and evolved independently. Reusing DNN modules is expected to cut the energy- and data-intensive cost of training, and replacing them is expected to fix faulty functionality without costly retraining. Viewing a DNN as a composition of modules rather than a black box can also improve the explainability of its behavior. The following papers document progress on this project:
Modular Deep Learning has been supported in part by the following grant.
- US National Science Foundation, SHF:Small: More Modular Deep Learning. PI: Hridesh Rajan (2022-2025), Total award amount: $580,000, Links: NSF.
More information about the Modular Deep Learning project.
Fault Localization for Deep Learning
Deep neural networks now sit behind decisions in healthcare, transportation, and many other areas, yet they can carry faults that undermine their safety and reliability. The fault localization techniques that software engineers have refined over decades do not transfer cleanly to neural networks, because traditional software and deep learning rest on very different computational models, and a bug means something different in each. This project takes on that gap. We watch how a model behaves while it trains, design compact abstractions of that behavior to pinpoint where things go wrong, and cut the cost of retraining so that debugging deep learning becomes faster and more accessible. The work builds on DeepLocalize, our first approach for bug localization in deep neural networks.
This is a collaborative award led at Tulane with Mohammad Wardat at Oakland University, who earned his PhD with our group.
- US National Science Foundation, Collaborative Research: SHF: Small: Fault Localization for Deep Learning. PI: Hridesh Rajan (Tulane) with Mohammad Wardat (Oakland), total award amount approximately $600,000, Links: NSF Tulane and NSF Oakland.
Dependable Data Science
D4 advances the theoretical foundations of data science by enabling an understanding of the risks to the dependability of data-science lifecycles, formalizing the rigorous mathematical basis of the measures of dependability, and identifying mechanisms to create dependable data-science lifecycles. The project defines a risk as a cause that can lead to failures in the processes that plan for, acquire, manage, analyze, and infer from data. For instance, an inference procedure that is significantly expensive can deliver late information to a human operator facing a deadline (complexity as a risk); a recommendation without an uncertainty measure leaves an operator no means to decide whether to trust it (uncertainty as a risk). Compared with work that focuses narrowly on fairness or accountability for machine learning algorithms, this project takes a holistic perspective across the entire data-science lifecycle. The following papers document progress on this project:
- ICSE '25
- ICSE '25
- ICSE '25
- FSE '25
- ICSE '23
- ICSE '23
- ESEC-FSE '23
- TDS '22
- MobiHoc '22
- ICSE '22
- ICSE '22
- ICSE '22
- ESEC-FSE '20
- ESEC-FSE '19
Dependable Data Science (D4) has been supported in part by the following grant.
- US National Science Foundation, HDR TRIPODS: D4 (Dependable Data-Driven Discovery) Institute. PI: Hridesh Rajan and Co-I: Pavan Aduri, Eric Weber, Daniel Nettleton, and Chinmay Hegde. Total award amount: $1,531,995.00, Links: NSF.
AI/Data Science for Software Engineering
We use large language models, program analysis, and large-scale data science to understand, localize, repair, and improve software. This work ranges from agent-oriented and analysis-driven techniques built on modern language models to the Boa infrastructure for mining millions of software projects at once.
LLM-based Program Analysis and Repair
Large language models and program analysis are changing how developers understand and improve code. We design agent-oriented and analysis-driven techniques that localize design issues, repair data-driven errors, and test modern AI systems, so that the benefits of large models reach real engineering tasks without sacrificing reliability. Recent results include an intent-aware approach to repairing data-driven errors in large language models, an LLM-based agent for automated code design issue localization, and mock deep testing for separating the development of data and models in deep learning.
Selected papers from this thread:
- IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models (FSE 2025)
- An LLM-Based Agent-Oriented Approach for Automated Code Design Issue Localization (ICSE 2025)
- Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning (ICSE 2025)
Boa
Boa applies big data analytics and data science to software engineering. It is a domain-specific language and infrastructure whose goal is to significantly lower the experimental cost of mining ultra-large-scale open source repositories. Boa consists of a language, its compiler and data updating tools, terabytes of raw data drawn from hundreds of thousands of open source projects, a map-reduce backend to analyze that data, a compute cluster, and a web-based frontend for writing analyses. By turning millions of projects into data we can query, Boa lets researchers ask and answer empirical questions about software at a scale that would otherwise be out of reach. The following papers document progress on this project:
- TSE '26
- TOSEM '24
- ICSE '24
- ICSE '24
- ASE '23
- ESEC-FSE '22
- ICSE '21
- ESEC-FSE '21
- ICSE '20
- ICSE '20
- Bioinformatics '20
- MSR '19
- BMC Bioinformatics '19
- Upadhayaya PhD Thesis
- TSE '18
- ICSE '18
- ICSE '18
- Tiwari MS Thesis
- OOPSLA '17
- MSR '17
- ICSE '17 (NIER)
- ICSE '17 (NIER)
- TOSEM '15
- ICSE '15 (NIER)
- BoaBook '15
- ICSE '14
- FSE '14
- Dyer PhD Thesis
- SPLASH '13 (SRC)
- ICSE '13
- GPCE '13
- ASE '13
- SPLASH '12 (Poster)
- ACoM '08
Boa has been supported in part by the following grants.
-
US National Science Foundation, CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale. PI: Hridesh Rajan and Co-I: Tien Nguyen, Brian Nosek (2021-2024), Total award amount: $1,559,806, Links: ISU, UT Dallas, and COS.
-
US National Science Foundation, CI-EN: Boa: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale. PI: Hridesh Rajan and Co-I: Tien Nguyen, Robert Dyer (2015-2018), Total award amount: $1,559,806, Links: ISU and BGSU.
-
US National Science Foundation, SHF: Large: Collaborative Research: Inferring Software Specifications from Open Source Repositories by Leveraging Data and Collective Community Expertise. PI: Hridesh Rajan and Co-I: Robert Dyer, Tien Nguyen, Gary T. Leavens, and Vasant Honavar (2015-2018), Total award amount: $1,604,843, Links: ISU, BGSU, UCF, and PSU.
More information about the Boa project.
Past Projects
Panini
The Panini project developed the capsule-oriented programming model, aimed at making concurrent software development easier through two properties: given a module, it should be possible to statically and modularly identify all points where other modules might interfere; and given a module and the interfaces of the modules it interacts with, it should be possible to statically and modularly construct an upper bound on the behavior of all potentially interfering tasks. Together these properties enable modular reasoning about concurrent programs. We created two systems that support this model: PaniniJ, an extension of Java and its reference compiler, and @PaniniJ, an annotation-based framework.
- FSE '18
- Long PhD Thesis
- Lin MS Thesis
- Bagherzadeh PhD Thesis
- OOPSLA '16
- Modularity '16
- Modularity '16
- Upadhayaya MS Thesis
- Mooney MS Thesis
- OOPSLA '15
- Modularity '15
- ICSE '15 (NIER)
- ECOOP '15
- AGERE '14
- Long MS Thesis
- Onward! '10
- GPCE '13
- FoSER '10
Ptolemy
Ptolemy designed an event-based language whose goal is to enable more modular reasoning about advanced separation of concerns mechanisms such as implicit invocation and aspects. Ptolemy provides quantified, typed events that act as an interface between modules, and translucid contracts that enable modular reasoning about modules that announce events and those that listen to them.
- TOMC '16
- Modularity '15
- TAOSD '13
- Modularity '13
- FOAL '12
- AOSD '12
- Bagherzadeh MS Thesis
- FOAL '11
- AOSD '11
- FOAL '10
- ESCOT '10
- TOSEM '09
- Setty MS Thesis
- ECOOP '08
- IEEE Software '06
- ESEC/FSE '05
Eos
Eos is a unified aspect-oriented extension for C# on the Microsoft .NET Framework. Eos unifies aspects and objects as classpects, which improves the conceptual integrity of the language and the compositionality of aspect modules.
Nu
The Nu project explored intermediate language design and corresponding virtual machine extensions to better support features of aspect-oriented languages, with goals including better tool-chain compatibility, better runtime performance, cross-language compatibility, and efficient runtime weaving.
Sapha
Sapha designed, implemented, and evaluated automatic thread-to-core assignment techniques for heterogeneous multi-core processors, improving their utilization without requiring hand-built representative input sets.
- TOCE '12
- Sondag PhD Thesis
- CGO '11
- CCSC '11
- SIGCSE '10
- RTSS '10
- CCSC '10
- Sondag MS Thesis
- IW-MSE '09
- PLOS '07
Slede
Slede looked at specification language design and verification mechanisms for cryptographic protocols in sensor networks, helping find cryptographic errors and improving the reliability of these networks.
Tisa
Tisa extended service-oriented architecture with trustworthy means for clients to specify, brokers to verify, and implementations to prove that desired non-functional properties are satisfied during request processing, with a prototype implementation demonstrating its practical value.
Frances
The Frances project produced tools for teaching code generation and the mapping of high-level languages to low-level languages, including Frances-A for teaching computer architecture and how high-level code executes on a system.
Osiris
In the Osiris project we looked at approaches and tools for automatic and semi-automatic generation of concern models from source code, to support comprehension of large software systems.