µPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

By: Deepak-George Thomas, Matteo Biagiola, Nargiz Humbatova, Mohammad Wardat, Gunel Jahangirova, Hridesh Rajan, and Paolo Tonella

Abstract

Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool μPRL. Finally, we discuss the experimental results, showing that μPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

ACM Reference

Thomas, D.-G. et al. 2025. µPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults. ICSE’2025: The 47th International Conference on Software Engineering (Apr. 2025).

BibTeX Reference

@inproceedings{thomas2025uprl,
  author = {Deepak-George Thomas and Matteo Biagiola and Nargiz Humbatova and Mohammad Wardat and Gunel Jahangirova and Hridesh Rajan and Paolo Tonella},
  title = {µPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults},
  booktitle = {ICSE'2025: The 47th International Conference on Software Engineering},
  location = {Ottawa, Canada},
  month = {April 27-May 3},
  year = {2025},
  entrysubtype = {conference},
  abstract = {Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool μPRL. Finally, we discuss the experimental results, showing that μPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.
  }
}