IMPROVING DETECTION PERFORMANCE OF DUPLICATE BUG REPORTS USING EXTENDED CLASS CENTROID INFORMATION

Main Article Content

Phuc Minh Nhan

Abstract

In software maintenance, bug reports play an important role in the correctness of  software packages. Unfortunately, the duplicatebug report problem arises because there are too many duplicate bug reports in various software projects. Handling with duplicate bug reports is thus time-consuming and has high cost of software maintenance. Therefore, this research introduces a detection scheme based on the extended class centroid information (ECCI) to enhance the
detection performance. This method is extended from the previous one, which used only centroid method without considering the effects of both inner and inter class. Besides, this method also improved the previous use of normalized cosine in identifying the similarity between two bug reports by denormalized cosine.  The effectiveness of ECCI is proved through the empirical study with three open-source projects: SVN, Argo UML and Apache. The experimental results show that
ECCI outperforms other detection schemes by about 10% in all cases.

Downloads

Download data is not yet available.

Article Details

How to Cite
Nhan, P. (2019) “IMPROVING DETECTION PERFORMANCE OF DUPLICATE BUG REPORTS USING EXTENDED CLASS CENTROID INFORMATION”, The Scientific Journal of Tra Vinh University, 1(26), pp. 71-79. doi: 10.35382/18594816.1.26.2017.107.
Section
Articles

References

[1] Vincent, Bram Adams MCIS, Polytechnique Montreal, Québec. The Impact of Cross-Distribution
Bug Duplicates, Empirical Study on Debian and
Ubuntu. IEEE 15th International Working Conference on Source Code Analysis and Manipulation
(SCAM). 2015;p. 131–140.
[2] Lyndon Hiew. Assisted Detectionof Duplicate Bug
Reports [Master Thesis]; May 2006. The University
of British Columbia.
[3] Zhi-Hao Chen. Duplicate Detectionon Bug Reportsusing N-Gram Featuresand Cluster Shrinkage [Master Thesis]; Jul 2011. YuanZe University.
[4] Hung-Hsueh Du. A study of Duplication Detection
Methods for Bug Reports based on BM25 Feature
Weighting [Master Thesis]; Nov 2011. YuanZe University.
[5] Stephen E Robertson, Steve Walker, Susan Jones,
Micheline Hancock-Beaulieu, Mike Gatford.
OkapiatTREC-3. in Proceeding sof the Third Text
Retrieval Conference(TREC-3). 1994;p. 109–126.
[6] Akihiro Tsuruda, Yuki Manabe, Masayoshi Aritsugi.
Can We Detect Bug Report Duplication with Unfinished Bug Reports? Software Engineering Conference (APSEC) 2015 Asia-Pacific. 2015;p. 151–158.
ISSN 1530-1362.
[7] Chao-Yuan Lee, Dan-Dan Hu, Zhong-Yi Feng,
Cheng-Zen Yang. Mining Temporal Information to
Improve Duplication Detection on Bug Reports. Advanced Applied Informatics (IIAI-AAI) 2015 IIAI 4th
International Congress on. 2015;p. 551–555. ISSN
1530-1362.
[8] Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang,
Siau-Cheng Khoo. Discriminative model approach
towards accurate duplicate bug report retrieval. In
ICSE 2010: Proceedings of the 32nd international
conference on Software Engineering, Cape Town,
South Africa. 2010;IEEE Computer Society.
[9] Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik,
Jiasu Sun. An Approach to DetectingDuplicate Bug
Reports using Natural Language and Execution Information. in Proceedings of the 30th International
Conference on Software Engineering (ICSE ’08).
2008;p. 461–470.
[10] Eui-Hong Hanand George Karypis. Centroid-Based
Document Classification: Analysisand Experimental
Results. in Proceeding sof the Fourth European Conferenceon Principles of Data Miningand Knowledge
Discovery(PKDD’00). 2000;p. 424–431.
[11] Hu Guan, Jingyu Zhou, Minyi Guo. A Class-FeatureCentroid Classifier for Text Categorization. in Proceeding sof the 18th International Conference on
World Wide Web. 2009;p. 201–210.
[12] Xiaoyan Zhang, Ting Wang, Xiaobo Liang, FengAo,
YanLi. A Class-based Feature Weighting Method
for Text Classification. Journal of Computational In
formation System. 2012;3:965–972.

Most read articles by the same author(s)