Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow
By: Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim
Download PaperAbstract
Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design Maple, an API usage mining approach that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them have potential API usage violations that may produce the symptoms such as program crashes and resource leaks. Such API misuse is caused by three main reasons—missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new human-in-the-loop approach to augment Stack Overflow code snippets and help the user consider better or alternative API usage.
ACM Reference
Zhang, T. et al. 2018. Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow. ICSE’18: The 40th International Conference on Software Engineering (May 2018).
BibTeX Reference
@inproceedings{ReliableQA2018,
author = {Tianyi Zhang and Ganesha Upadhyaya and Anastasia Reinhardt and Hridesh Rajan and Miryung Kim},
title = {Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow},
booktitle = {ICSE'18: The 40th International Conference on Software Engineering},
location = {Gothenberg, Sweden},
month = {May 27-June 3, 2018},
year = {2018},
entrysubtype = {conference},
abstract = {
Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs.
This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow.
To reduce manual assessment effort, we design Maple, an API usage mining approach
that extracts patterns from over 380K Java repositories on GitHub and subsequently
reports potential API usage violations in Stack Overflow posts.
We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them
have potential API usage violations that may produce the symptoms such as program
crashes and resource leaks. Such API misuse is caused by three main
reasons---missing control constructs, missing or incorrect order of API calls, and
incorrect guard conditions. Even the posts that are accepted as correct answers or
upvoted by other programmers are not necessarily more reliable than other posts in
terms of API misuse. This study result calls for a new human-in-the-loop approach
to augment Stack Overflow code snippets and help the user consider better or
alternative API usage.
}
}