The statistics that represent the real world are not perfectly balanced because any class can predominantly increase in ratio with the other classes. This category of data sources is sometimes called skewed or class imbalance data sources. One of the most effective methods for locating defective modules is called software defect prediction, which uses data mining techniques. The classification methods that are now available can be utilized for effective knowledge discovery on datasets that have a class balance. As the class imbalance nature of the datasets becomes more prevalent, the defect prediction rate for those datasets will, in turn, become less accurate. The proposed algorithms include a novel oversampling strategy and an under-sampling technique. These techniques are implemented by deleting noisy and weak instances from both the majority and the minority. This is done to improve the performance of class imbalance data streams. The findings prove that the class imbalance issue in software defect datasets may be effectively addressed and resolved. Experiments are carried out on datasets of software defects that are class-imbalanced.
You may also start an advanced similarity search for this article.