Varghese, B., McKee, G. and Alexandrov, V. (2010) Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA). IEEE, pp. 985-990. ISBN 9781424467013 doi: 10.1109/WAINA.2010.21
Abstract/Summary
Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Altmetric Badge
| Additional Information | Conference was held in Perth, Australia, 20-23 Apr 2010. |
| Item Type | Book or Report Section |
| URI | https://reading-clone.eprints-hosting.org/id/eprint/17489 |
| Item Type | Book or Report Section |
| Refereed | Yes |
| Divisions | Science |
| Uncontrolled Keywords | cluster-based implementation; fault tolerance; intelligent agents; swarm-array computing |
| Additional Information | Conference was held in Perth, Australia, 20-23 Apr 2010. |
| Publisher | IEEE |
| Download/View statistics | View download statistics for this item |
University Staff: Request a correction | Centaur Editors: Update this record
Lists
Lists