|
[[_TOC_]]
|
|
[[_TOC_]]
|
|
|
|
|
|
|
|
# Overview
|
|
|
|
|
|
|
|
EAR is formed by a set of components, where each of them and their relationships with each other provides a full system software which accounts the power and energy consumption of jobs and applications in a cluster, provides a runtime library for application performance monitoring and optimization which can be loaded dynamically during application execution, a global power-capping system and a flexible reporting system to fit any storage requirements for saving all the collected data, all designed to be as most transparent as possible from the user point of view.
|
|
|
|
This section introduces all of these components and how they are stacked to provide different services and EAR features.
|
|
|
|
|
|
|
|
## System power consumption and job accounting
|
|
|
|
|
|
|
|
This is the most basic feature.
|
|
|
|
EAR is able to collect node power consumption and report it periodically thanks to the [EAR Node Manager](#ear-node-manager), a Linux service which runs on each compute node.
|
|
|
|
Is up to the sysadmin to decide how and where its periodic metrics are [reported](Configuration#eard-configuration).
|
|
|
|
The following figure shows this scheme.
|
|
|
|
|
|
|
|
![EAR_basic_accounting.svg](images/EAR_basic_accounting.svg)
|
|
|
|
|
|
|
|
The EAR Node Manager provides an API which can be used by a batch scheduler plug-in/hook to indicate the start/end of jobs/steps so it can account the power consumption of such entities.
|
|
|
|
Currently, EAR distribution comes with a [SLURM SPANK plug-in](#ear-slurm-plugin) for supporting the accounting of jobs and steps in SLURM systems.
|
|
|
|
|
|
|
|
## Application performance monitoring and energy efficiency optimization
|
|
|
|
|
|
|
|
Along with applications running in compute nodes, a runtime library can be loaded dynamically (thanks again to the batch scheduler support).
|
|
|
|
The [EAR Job Manager](#the-ear-library-job-manager) runs within application/workflow processes, so it can collect performance metrics, which can be reported in the same way as with the Node Manager, but still configurable.
|
|
|
|
Moreover, the Job Manager comes with optimization policies, which can select the optimal CPU/IMC/GPU frequencies based on those performance metrics by contacting with the Node Manager.
|
|
|
|
Below figure shows the interaction between these two components.
|
|
|
|
|
|
|
|
![EAR_job_mgr.svg](images/EAR_job_mgr.svg)
|
|
|
|
|
|
# EAR Node Manager
|
|
# EAR Node Manager
|
|
|
|
|
|
The EAR Daemon (EARD) is a per-node linux service that provides privileged metrics of each node as well as a periodic power monitoring service.
|
|
The EAR Daemon (EARD) is a per-node linux service that provides privileged metrics of each node as well as a periodic power monitoring service.
|
... | @@ -61,7 +87,7 @@ Visit the [EAR configuration file page](Configuration#EARD-configuration) for mo |
... | @@ -61,7 +87,7 @@ Visit the [EAR configuration file page](Configuration#EARD-configuration) for mo |
|
|
|
|
|
The EAR Database Daemon (EARDBD) acts as an intermediate layer between any EAR component that inserts data and the EAR's Database, in order to prevent the database server from collapsing due to getting overrun with connections and insert queries.
|
|
The EAR Database Daemon (EARDBD) acts as an intermediate layer between any EAR component that inserts data and the EAR's Database, in order to prevent the database server from collapsing due to getting overrun with connections and insert queries.
|
|
|
|
|
|
The Database Manager caches records generated by the [EAR Library](#the-ear-library) and the [EARD](#ear-node-manager) in the system and reports it to the centralized database.
|
|
The Database Manager caches records generated by the [EAR Library](#the-ear-library-job-manager) and the [EARD](#ear-node-manager) in the system and reports it to the centralized database.
|
|
It is recommended to run several EARDBDs if the cluster is big enough in order to reduce the number of inserts and connections to the database.
|
|
It is recommended to run several EARDBDs if the cluster is big enough in order to reduce the number of inserts and connections to the database.
|
|
|
|
|
|
Also, the EARDBD accumulates data during a period of time to decrease the total insertions in the database, helping the performance of big queries.
|
|
Also, the EARDBD accumulates data during a period of time to decrease the total insertions in the database, helping the performance of big queries.
|
... | | ... | |