Dynamic load balancing in parallel KD-tree k-means

[thumbnail of 2010_DiFatta10-IEEE-ScalCom.pdf]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.
| Preview

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Di Fatta, G. and Pettinger, D. (2010) Dynamic load balancing in parallel KD-tree k-means. In: CIT '10: Proceedings of the 10th IEEE International Conference on Computer and Information Technology. IEEE, Washington DC, pp. 2478-2485. ISBN 978-0-7695-4108-2 doi: 10.1109/CIT.2010.424

Abstract/Summary

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Altmetric Badge

Item Type Book or Report Section
URI https://reading-clone.eprints-hosting.org/id/eprint/6127
Identification Number/DOI 10.1109/CIT.2010.424
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Uncontrolled Keywords clustering, dynamic load balancing, KD Trees, parallel k-means Clustering;
Publisher IEEE
Publisher Statement (c) 2010 IEEE
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar