Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence

By: Hridesh Rajan, Tien N. Nguyen, Gary T. Leavens, and Robert Dyer

Abstract

Despite[1] their proven benefits, useful, comprehen- sible, and efficiently checkable specifications are not widely available. This is primarily because writing useful, non-trivial specifications from scratch is too hard, time consuming, and requires expertise that is not broadly available. Furthermore, the lack of specifications for widely-used libraries and frameworks, caused by the high cost of writing specifications, tends to have a snowball effect. Core libraries lack specifications, which makes specifying applications that use them expensive. To contain the skyrocketing development and maintenance costs of high assur- ance systems, this self-perpetuating cycle must be broken. The labor cost of specifying programs can be significantly decreased via advances in specification inference and synthesis, and this has been attempted several times, but with limited success. We believe that practical specification inference and synthesis is an idea whose time has come. Fundamental breakthroughs in this area can be achieved by leveraging the collective intelligence available in software artifacts from millions of open source projects. Fine- grained access to such data sets has been unprecedented, but is now easily available. We identify research directions and report our preliminary results on advances in specification inference that can be had by using such data sets to infer specifications.

ACM Reference

Rajan, H. et al. 2015. Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence. ICSE’15: The 37th International Conference on Software Engineering: NIER Track (May 2015).

BibTeX Reference

@inproceedings{rajan2015inferring,
  author = {Hridesh Rajan and Tien N. Nguyen and Gary T. Leavens and Robert Dyer},
  title = {Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence},
  booktitle = {ICSE'15: The 37th International Conference on Software Engineering: NIER Track},
  location = {Florence, Italy},
  month = {May},
  year = {2015},
  entrysubtype = {conference},
  abstract = {
    Despite[1] their proven benefits, useful, comprehen- sible, and efficiently
    checkable specifications are not widely available. This is primarily because
    writing useful, non-trivial specifications from scratch is too hard, time
    consuming, and requires expertise that is not broadly available. Furthermore,
    the lack of specifications for widely-used libraries and frameworks, caused by
    the high cost of writing specifications, tends to have a snowball effect. Core
    libraries lack specifications, which makes specifying applications that use
    them expensive. To contain the skyrocketing development and maintenance costs
    of high assur- ance systems, this self-perpetuating cycle must be broken. The
    labor cost of specifying programs can be significantly decreased via advances
    in specification inference and synthesis, and this has been attempted several
    times, but with limited success. We believe that practical specification
    inference and synthesis is an idea whose time has come. Fundamental
    breakthroughs in this area can be achieved by leveraging the collective
    intelligence available in software artifacts from millions of open source
    projects. Fine- grained access to such data sets has been unprecedented, but
    is now easily available. We identify research directions and report our
    preliminary results on advances in specification inference that can be had by
    using such data sets to infer specifications.
  }
}