Mining Preconditions of APIs in Large-scale Code Corpus
By: Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan
Download PaperAbstract
Modern software relies on existing application programming in- terfaces (APIs) from libraries. Formal specifications for the APIs enable many software engineering tasks as well as help developers correctly use them. In this work, we mine large-scale repositories of existing open-source software to derive potential preconditions for API methods. Our key idea is that APIs’ preconditions would appear frequently in an ultra-large code corpus with a large num- ber of API usages, while project-specific conditions will occur less frequently. First, we find all client methods invoking APIs. We then compute a control dependence relation from each call site and mine the potential conditions used to reach those call sites. We use these guard conditions as a starting point to automatically infer the preconditions for each API. We analyzed almost 120 million lines of code from SourceForge and Apache projects to infer precondi- tions for the standard Java Development Kit (JDK) library. The results show that our technique can achieve high accuracy with recall from 75–80% and precision from 82–84%. We also found 5 preconditions missing from human written specifications. They were all confirmed by a specification expert. In a user study, par- ticipants found 82% of the mined preconditions as a good starting point for writing specifications. Using our mining result, we also built a benchmark of more than 4,000 precondition-related bugs.
ACM Reference
Nguyen, H.A. et al. 2014. Mining Preconditions of APIs in Large-scale Code Corpus. FSE‘14: 22nd International Symposium on Foundations of Software Engineering (Nov. 2014), to appear.
BibTeX Reference
@inproceedings{nguyen2014mining,
author = {Nguyen, Hoan Anh and Dyer, Robert and Nguyen, Tien N. and Rajan, Hridesh},
title = {Mining Preconditions of {API}s in Large-scale Code Corpus},
booktitle = {FSE`14: 22nd International Symposium on Foundations of Software Engineering},
series = {FSE'14},
month = {November},
year = {2014},
pages = {to appear},
location = {Hong Kong},
entrysubtype = {conference},
abstract = {
Modern software relies on existing application programming in- terfaces (APIs)
from libraries. Formal specifications for the APIs enable many software
engineering tasks as well as help developers correctly use them. In this work,
we mine large-scale repositories of existing open-source software to derive
potential preconditions for API methods. Our key idea is that APIs’
preconditions would appear frequently in an ultra-large code corpus with a large
num- ber of API usages, while project-specific conditions will occur less
frequently. First, we find all client methods invoking APIs. We then compute a
control dependence relation from each call site and mine the potential
conditions used to reach those call sites. We use these guard conditions as a
starting point to automatically infer the preconditions for each API. We
analyzed almost 120 million lines of code from SourceForge and Apache projects
to infer precondi- tions for the standard Java Development Kit (JDK) library.
The results show that our technique can achieve high accuracy with recall from
75–80% and precision from 82–84%. We also found 5 preconditions missing from
human written specifications. They were all confirmed by a specification expert.
In a user study, par- ticipants found 82% of the mined preconditions as a good
starting point for writing specifications. Using our mining result, we also
built a benchmark of more than 4,000 precondition-related bugs.
}
}