On Accelerating Ultra-Large-Scale Mining

By: Ganesha Upadhyaya and Hridesh Rajan

Abstract

Ultra-large-scale mining has been shown to be useful for a number of software engineering tasks e.g. mining specifications, defect prediction. We propose a new research direction for accelerating ultra-large-scale mining that goes beyond parallelization. Our key idea is to analyze the interaction pattern between the mining task and the artifact to cluster artifacts such that running the mining task on one candidate artifact from each cluster is sufficient to produce results for other artifacts in the same cluster. Our artifact clustering criteria go beyond syntactic, semantic, and functional similarities to mining-task-specific similarity, where the interaction pattern between the mining task and the artifact is used for clustering. Our preliminary evaluation demonstrates that our technique significantly reduces the overall mining time.

ACM Reference

Upadhyaya, G. and Rajan, H. 2017. On Accelerating Ultra-Large-Scale Mining. ICSE’17: The 39th International Conference on Software Engineering: NIER Track (May 2017).

BibTeX Reference

@inproceedings{upadhyaya2017accelerating,
  author = {Ganesha Upadhyaya and Hridesh Rajan},
  title = {On Accelerating Ultra-Large-Scale Mining},
  booktitle = {ICSE'17: The 39th International Conference on Software Engineering: NIER Track},
  location = {Buenos Aires, Argentina},
  month = {May},
  year = {2017},
  entrysubtype = {conference},
  abstract = {
    Ultra-large-scale mining has been shown to be useful for a number of
    software engineering tasks e.g. mining specifications, defect prediction.
    We propose a new research direction for accelerating ultra-large-scale
    mining that goes beyond parallelization. Our key idea is to analyze the
    interaction pattern between the mining task and the artifact to cluster
    artifacts such that running the mining task on one candidate artifact from
    each cluster is sufficient to produce results for other artifacts in the
    same cluster. Our artifact clustering criteria go beyond syntactic, semantic,
    and functional similarities to mining-task-specific similarity, where the
    interaction pattern between the mining task and the artifact is used for
    clustering. Our preliminary evaluation demonstrates that our technique
    significantly reduces the overall mining time.
  }
}