Efficient group communication for large-scale parallel clustering

Full text not archived in this repository.

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Pettinger, D. and Di Fatta, G. (2013) Efficient group communication for large-scale parallel clustering. In: Intelligent Distributed Computing VI. Studies in Computational Intelligence, 446. Springer Berlin Heidelberg, Berlin Heidelberg, pp. 155-164. ISBN 9783642325243 doi: 10.1007/978-3-642-32524-3_20

Abstract/Summary

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Altmetric Badge

Additional Information Proceedings of the 6th International Symposium on Intelligent Distributed Computing - IDC 2012, Calabria, Italy, September 2012
Item Type Book or Report Section
URI https://reading-clone.eprints-hosting.org/id/eprint/36471
Identification Number/DOI 10.1007/978-3-642-32524-3_20
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Additional Information Proceedings of the 6th International Symposium on Intelligent Distributed Computing - IDC 2012, Calabria, Italy, September 2012
Publisher Springer Berlin Heidelberg
Download/View statistics View download statistics for this item

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar