INVESTIGADORES
FERNANDEZ SLEZAK Diego
congresos y reuniones científicas
Título:
Clover: Effcient Monitoring of HPC Clusters
Autor/es:
MONTALDO, DAMIÁN; FERNÁNDEZ SLEZAK, DIEGO; MOCSKOS, ESTEBAN
Lugar:
Mar del Plata
Reunión:
Congreso; JAIIO; 2009
Resumen:
Abstract. As a consequence of the last decade evolution of computa- tional power and the proliferation of clusters, i.e. collection of intercon- nected personal computers, system monitoring has become a critical and non-trivial task. Specialized protocols have been developed to provide cluster state information without degradation of performance and avoid- ing the consumption of computational power.As a consequence of the last decade evolution of computa- tional power and the proliferation of clusters, i.e. collection of intercon- nected personal computers, system monitoring has become a critical and non-trivial task. Specialized protocols have been developed to provide cluster state information without degradation of performance and avoid- ing the consumption of computational power. ClOver is a system monitoring tool designed on a plugin architecture based on the CluMon project, a tool developed at the National Center for Supercomputing Applications (NCSA). The main goal is to allow the veri cation of the cluster state at a glance. It includes a plugin im- plemented over the Intelligent Platform Management Interface (IPMI) protocol to collect data with almost no CPU-cycle consumption. The aim of this paper is to show the ClOver current state of development. Furthermore, the bene ts of using IPMI in monitoring activities are ver- i ed running the simulation of the heat equation as a test application in both shared and distributed memory architectures. The results obtained suggest that the use of specialized hardware proto- cols for sensoring and monitoring would save valuable CPU cycles.is a system monitoring tool designed on a plugin architecture based on the CluMon project, a tool developed at the National Center for Supercomputing Applications (NCSA). The main goal is to allow the veri cation of the cluster state at a glance. It includes a plugin im- plemented over the Intelligent Platform Management Interface (IPMI) protocol to collect data with almost no CPU-cycle consumption. The aim of this paper is to show the ClOver current state of development. Furthermore, the bene ts of using IPMI in monitoring activities are ver- i ed running the simulation of the heat equation as a test application in both shared and distributed memory architectures. The results obtained suggest that the use of specialized hardware proto- cols for sensoring and monitoring would save valuable CPU cycles.ClOver current state of development. Furthermore, the bene ts of using IPMI in monitoring activities are ver- i ed running the simulation of the heat equation as a test application in both shared and distributed memory architectures. The results obtained suggest that the use of specialized hardware proto- cols for sensoring and monitoring would save valuable CPU cycles.