Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence

By: Hridesh Rajan, Tien N. Nguyen, Gary T. Leavens, and Robert Dyer

Abstract

Despite[1] their proven benefits, useful, comprehen- sible, and efficiently checkable specifications are not widely available. This is primarily because writing useful, non-trivial specifications from scratch is too hard, time consuming, and requires expertise that is not broadly available. Furthermore, the lack of specifications for widely-used libraries and frameworks, caused by the high cost of writing specifications, tends to have a snowball effect. Core libraries lack specifications, which makes specifying applications that use them expensive. To contain the skyrocketing development and maintenance costs of high assur- ance systems, this self-perpetuating cycle must be broken. The labor cost of specifying programs can be significantly decreased via advances in specification inference and synthesis, and this has been attempted several times, but with limited success. We believe that practical specification inference and synthesis is an idea whose time has come. Fundamental breakthroughs in this area can be achieved by leveraging the collective intelligence available in software artifacts from millions of open source projects. Fine- grained access to such data sets has been unprecedented, but is now easily available. We identify research directions and report our preliminary results on advances in specification inference that can be had by using such data sets to infer specifications.

ACM Reference

Rajan, H. et al. 2015. Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence. 37th IEEE/ACM International Conference on Software Engineering, ICSE, Florence, Italy (2015), 579–582.

BibTeX Reference

@inproceedings{RajanNguyenLeavensDyer2015,
  author = {Hridesh Rajan and Tien N. Nguyen and Gary T. Leavens and Robert Dyer},
  title = {Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence},
  booktitle = {37th IEEE/ACM International Conference on Software Engineering, ICSE, Florence, Italy},
  pages = {579--582},
  year = {2015},
  publisher = {{IEEE} Computer Society},
  editor = {Antonia Bertolino and Gerardo Canfora and Sebastian G. Elbaum},
  doi = {10.1109/ICSE.2015.339},
  abstract = {
  Despite[1] their proven benefits, useful, comprehen- sible, and efficiently
  checkable specifications are not widely available. This is primarily because
  writing useful, non-trivial specifications from scratch is too hard, time
  consuming, and requires expertise that is not broadly available. Furthermore,
  the lack of specifications for widely-used libraries and frameworks, caused by
  the high cost of writing specifications, tends to have a snowball effect. Core
  libraries lack specifications, which makes specifying applications that use
  them expensive. To contain the skyrocketing development and maintenance costs
  of high assur- ance systems, this self-perpetuating cycle must be broken. The
  labor cost of specifying programs can be significantly decreased via advances
  in specification inference and synthesis, and this has been attempted several
  times, but with limited success. We believe that practical specification
  inference and synthesis is an idea whose time has come. Fundamental
  breakthroughs in this area can be achieved by leveraging the collective
  intelligence available in software artifacts from millions of open source
  projects. Fine- grained access to such data sets has been unprecedented, but
  is now easily available. We identify research directions and report our
  preliminary results on advances in specification inference that can be had by
  using such data sets to infer specifications.},
}