Parallel Support Vector Machines for Multi-Label Classification in Imbalanced Databases
Abstract
We propose a multi-label classification mining method using parallel support vector machines for imbalanced sample databases. The samples within the unbalanced sample database are partitioned into the majority sub-cluster and the minority sub-cluster by means of the hierarchical clustering algorithm, thereby achieving the oversampling of the unbalanced sample database. Using hierarchical clustering algorithm to divide into majority and minority sub clusters, complete oversampling of imbalanced sample database. Clustering itself does not directly generate new samples, but it divides the data into sub clusters, allowing oversampling to be more targeted in the sub clusters of minority classes, which can avoid noise or overfitting problems caused by blind oversampling. The role of clustering algorithms is to provide structured data partitioning basis for oversampling. Improve the accuracy of minority class classification in imbalanced sample databases through parallel computing, and use MapReduce to solve SVM dual problems in parallel to optimize hyperplanes for multi label classification. By using the Map function to divide the training sample set into small sample sets and train support vector machines, these support vector machines are then integrated in the Reduce stage to train a new support vector machine as the final decision function, in order to efficiently handle multi label classification problems. The experimental results show that the studied method consistently maintains a high accuracy of 0.95 or higher on the G-means index, far exceeding the comparison methods; In terms of acceleration ratio, when the sample size increased from 1000 to 10000, the acceleration ratio of our method steadily improved from 1.0 to 2.5, while the two comparison methods only reached 1.5 and 2.0 respectively, and there were significant fluctuations.DOI:
https://doi.org/10.31449/inf.v49i8.8350Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







