|
|
[[_TOC_]]
|
|
|
|
|
|
## EAR Node Manager
|
|
|
# EAR Node Manager
|
|
|
|
|
|
The EAR Daemon (EARD) is a per-node process that provides privileged metrics of each node as well as a periodic power monitoring service.
|
|
|
The EAR Daemon (EARD) is a per-node linux service that provides privileged metrics of each node as well as a periodic power monitoring service.
|
|
|
Said periodic power metrics can be sent to EAR's database directly, via the EAR Database Daemon (EARDBD) or by using some of the provided [report plug-ins](Report).
|
|
|
|
|
|
See the [EARDBD](#ear-database-manager) section and the [configuration page](Configuration) for more information about the EAR Database Manager and how to to configure the EARD to send its collected data to it.
|
|
|
|
|
|
### Overview
|
|
|
## Overview
|
|
|
|
|
|
The node Daemon is the component in charge of providing any kind of services that requires privileged capabilities. Current version is conceived as an external process executed with root privileges.
|
|
|
|
... | ... | @@ -17,18 +17,18 @@ The EARD provides the following services, each one covered by one thread: |
|
|
- Implements a periodic power monitoring service. This service allows EAR package to control the total energy consumed in the system.
|
|
|
- Offers a remote API used by EARplug, EARGM and EAR commands. This API accepts requests such as get the system status, change policy settings or notify new job/end job events.
|
|
|
|
|
|
### Requirements
|
|
|
## Requirements
|
|
|
|
|
|
If using the EAR Database as the storage targe, EARD connects with [EARDBD](#ear-database-manager) service, that has to be up before starting the node daemon, otherwise values reported by EARD to be stored in the database, will be lost.
|
|
|
|
|
|
### Configuration
|
|
|
## Configuration
|
|
|
|
|
|
The EAR Daemon uses the `$(EAR_ETC)/ear/ear.conf` file to be configured.
|
|
|
It can be dynamically configured by reloading the service.
|
|
|
|
|
|
Please visit the [EAR configuration file page](Configuration#EARD-configuration) for more information about the options of EARD and other components.
|
|
|
|
|
|
### Execution
|
|
|
## Execution
|
|
|
|
|
|
To execute this component, these `systemctl` command examples are provided:
|
|
|
|
... | ... | @@ -40,13 +40,13 @@ Log messages are generated during the execution. Use `journalctl` command to see |
|
|
|
|
|
- `sudo journalctl -u eard -f`
|
|
|
|
|
|
### Reconfiguration
|
|
|
## Reconfiguration
|
|
|
|
|
|
After executing a `systemctl reload eard` command, not all the EARD options will be dynamically updated. The list of updated variables are:
|
|
|
|
|
|
```
|
|
|
DefaultPstates
|
|
|
NodeDaemonMaxPstate
|
|
|
NodeDaemonMinPstate
|
|
|
NodeDaemonVerbose
|
|
|
NodeDaemonPowermonFreq
|
|
|
SupportedPolicies
|
... | ... | @@ -57,7 +57,7 @@ To reconfigure other options such as EARD connection port, coefficients, etc., i |
|
|
Visit the [EAR configuration file page](Configuration#EARD-configuration) for more information about the options of EARD and other components.
|
|
|
|
|
|
|
|
|
## EAR Database Manager
|
|
|
# EAR Database Manager
|
|
|
|
|
|
The EAR Database Daemon (EARDBD) acts as an intermediate layer between any EAR component that inserts data and the EAR's Database, in order to prevent the database server from collapsing due to getting overrun with connections and insert queries.
|
|
|
|
... | ... | @@ -68,20 +68,20 @@ Also, the EARDBD accumulates data during a period of time to decrease the total |
|
|
By now just the energy metrics are available to accumulate in the new metric called energy aggregation.
|
|
|
EARDBD uses periodic power metrics sent by the EARD, the per-node daemon, including job identification details (Job Id and Step Id when executed in a SLURM system).
|
|
|
|
|
|
### Configuration
|
|
|
## Configuration
|
|
|
|
|
|
The EAR Database Daemon uses the `$(EAR_ETC)/ear/ear.conf` file to be configured. It can be dynamically configured by reloading the service.
|
|
|
|
|
|
Please visit the [EAR configuration file page](Configuration#eardbd-configuration) for more information about the options of EARDBD and other components.
|
|
|
|
|
|
### Execution
|
|
|
## Execution
|
|
|
|
|
|
To execute this component, these `systemctl` command examples are provided:
|
|
|
- `sudo systemctl start eardbd` to start the EARDBD service.
|
|
|
- `sudo systemctl stop eardbd` to stop the EARDBD service.
|
|
|
- `sudo systemctl reload eardbd` to force reloading the configuration of the EARDBD service.
|
|
|
|
|
|
## EAR Global Manager
|
|
|
# EAR Global Manager (System power manager)
|
|
|
|
|
|
The EAR Global Manager Daemon (EARGMD) is a cluster wide component offering cluster energy monitoring and capping.
|
|
|
EARGM can work in two modes: manual and automatic.
|
... | ... | @@ -94,7 +94,7 @@ Aggregated metrics are computed by [EARDBD](#ear-database-manager) based on powe |
|
|
|
|
|
> __Note__: if you have multiple EARGMs running, only 1 should be used for Energy management. To turn off energy management for a certain EARGM simply set its energy value to 0.
|
|
|
|
|
|
### Power capping
|
|
|
## Power capping
|
|
|
|
|
|
EARGM also includes an optional power capping system. Power capping can work in two different ways:
|
|
|
|
... | ... | @@ -105,7 +105,7 @@ Furthermore, when using fine grained power cap control it is possible to have mu |
|
|
|
|
|
Meta-EARGMs are NOT compatible with the unlimited cluster powercap mode.
|
|
|
|
|
|
### Configuration
|
|
|
## Configuration
|
|
|
|
|
|
The EAR Global Manager uses the `$(EAR_ETC)/ear/ear.conf` file to be configured. It can be dynamically configured by reloading the service.
|
|
|
|
... | ... | @@ -113,7 +113,7 @@ Please visit the [EAR configuration file page](Configuration#EARGM-configuration |
|
|
|
|
|
Additonally, 2 EARGMs can be used in the same host by declaring the environment variable EARGMID to specify which EARGM configuration each should use. If said variable is not declared, all EARGMs in the same host will read the first entry.
|
|
|
|
|
|
### Execution
|
|
|
## Execution
|
|
|
|
|
|
To execute this component, these `systemctl` command examples are provided:
|
|
|
- `sudo systemctl start eargmd` to start the EARGM service.
|
... | ... | @@ -121,14 +121,14 @@ To execute this component, these `systemctl` command examples are provided: |
|
|
- `sudo systemctl reload eargmd` to force reloading the configuration of the EARGM service.
|
|
|
|
|
|
|
|
|
## The EAR Library
|
|
|
# The EAR Library (Job Manager)
|
|
|
|
|
|
The EAR Library (EARL) is the core of the EAR package.
|
|
|
The Library offers a lightweight and simple solution to select the optimal frequency for applications at runtime, with multiple power policies each with a different approach to find said frequency.
|
|
|
|
|
|
EARL uses the [Daemon](#ear-node-manager) to read performance metrics and to send application data to EAR Database.
|
|
|
|
|
|
### Overview
|
|
|
## Overview
|
|
|
|
|
|
EARL is dynamically loaded next to the running applications by the [EAR Loader](EAR-Loader).
|
|
|
The Loader detects whether the application is MPI or not.
|
... | ... | @@ -154,19 +154,19 @@ Some specific configurations are modified when jobs are executed sharing nodes w |
|
|
For example the memory frequency optiization is disabled.
|
|
|
See section [environment variables page](EAR-environment-variables) for more information on how to tune the EAR library optimization using environment variables.
|
|
|
|
|
|
### Configuration
|
|
|
## Configuration
|
|
|
|
|
|
The Library uses the `$(EAR_ETC)/ear.conf` file to be configured.
|
|
|
Please visit the [EAR configuration file page](Configuration#EARL-configuration) for more information about the options of EARL and other components.
|
|
|
|
|
|
EARL receives its specific settings through a shared memory regions initialized by [EARD](#ear-node-manager).
|
|
|
|
|
|
### Usage
|
|
|
## Usage
|
|
|
|
|
|
For information on how to run applications alongside with EARL read the [User guide](User-guide).
|
|
|
Next section contains more information regarding EAR's optimisation policies.
|
|
|
|
|
|
### Policies
|
|
|
## Policies
|
|
|
|
|
|
EAR offers three energy policies plugins: `min_energy`, `min_time` and `monitoring`.
|
|
|
The last one is not a power policy, is used just for application monitoring where CPU frequency is not modified (neither memory or GPU frequency).
|
... | ... | @@ -176,7 +176,7 @@ The energy policy is selected by setting the `--ear-policy=policy` option when s |
|
|
A policy parameter, which is a particular value or threshold depending on the policy, can be set using the flag `--ear-policy-th=value`.
|
|
|
Its default value is defined in the configuration file, for more information check the [configuration page](Configuration) for more information.
|
|
|
|
|
|
#### Plugin `min_energy`
|
|
|
### `min_energy`
|
|
|
|
|
|
The goal of this policy is to minimise the energy consumed with a limit to the performance degradation. This limit is is set in the SLURM `--ear-policy-th` option or the configuration file. The `min_energy` policy will select the optimal frequency that minimizes energy enforcing (performance degradation <= parameter). When executing with this policy, applications starts at default frequency(specified at ear.conf).
|
|
|
|
... | ... | @@ -184,7 +184,7 @@ The goal of this policy is to minimise the energy consumed with a limit to the p |
|
|
PerfDegr = (CurrTime - PrevTime) / (PrevTime)
|
|
|
```
|
|
|
|
|
|
#### Plugin `min_time`
|
|
|
### `min_time`
|
|
|
|
|
|
The goal of this policy is to improve the execution time while guaranteeing a minimum ratio between performance benefit and frequency increment that justifies the increased energy consumption from this frequency increment. The policy uses the SLURM parameter option mentioned above as a minimum efficiency threshold.
|
|
|
|
... | ... | @@ -204,9 +204,38 @@ Check the [configuration page](Configuration) for more information. |
|
|
|
|
|
Figure 1: `min_time` uses the threshold value as the minimum value for the performance gain between `F\\\[i\\\]` and `F\\\[i+1\\\]`.
|
|
|
|
|
|
### EAR API
|
|
|
# EAR Loader
|
|
|
|
|
|
EAR offers a user API for applications. The current EAR version only offers two functions, one to read the accumulated energy and time and another to compute the difference between the two measurements.
|
|
|
The EAR Loader is the responsible for loading the EAR Library.
|
|
|
It is a small and lightweight library loaded by the [EAR SLURM Plugin](#ear-slurm-plugin) (through the `LD_PRELOAD` environment variable) that identifies the user application and loads its corresponding EAR Library distribution.
|
|
|
|
|
|
The Loader detects the underlying application, identifying the MPI version (if used) and other minor details.
|
|
|
With this information, the loader opens the suitable EAR Library version.
|
|
|
|
|
|
As can be read in the [EARL](#the-ear-library) page, depending on the MPI vendor the MPI types can be different, preventing any compatibility between distributions.
|
|
|
For example, if the MPI distribution is OpenMPI, the EAR Loader will load the EAR Library compiled with the OpenMPI includes.
|
|
|
|
|
|
You can read the [installation guide](Admin-guide#quick-installation-guide) for more information about compiling and installing different EARL versions.
|
|
|
|
|
|
# EAR SLURM plugin
|
|
|
|
|
|
EAR SLURM plugin allows to dynamically load and configure the EAR library for the SLURM jobs (and steps), if the flag `--ear=on` is set or if it is enabled by default.
|
|
|
Additionally, it reports any jobs that start or end to the nodes' EARDs for accounting and monitoring purposes.
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
Visit the [SLURM SPANK plugin section](Configuration#slurm-spank-plugin-configuration-file) on the configuration page to set up properly the SLURM `/etc/slurm/plugstack.conf` file.
|
|
|
|
|
|
You can find the complete list of EAR SLURM plugin accpeted parameters in the
|
|
|
[user guide](User-guide#ear-job-submission-flags).
|
|
|
|
|
|
|
|
|
# EAR application API
|
|
|
|
|
|
EAR offers a user API for applications. The current EAR version only offers two sets of functions:
|
|
|
|
|
|
- To measure the energy consumption
|
|
|
- To set the cpu and gpu frequencies .
|
|
|
|
|
|
- `int ear_connect()`
|
|
|
- `int ear_energy(unsigned long \\\*energy_mj, unsigned long \\\*time_ms)`
|
... | ... | @@ -273,28 +302,3 @@ int main(int argc,char *argv[]) |
|
|
ear_disconnect();
|
|
|
}
|
|
|
``` |
|
|
|
|
|
## EAR Loader
|
|
|
|
|
|
The EAR Loader is the responsible for loading the EAR Library.
|
|
|
It is a small and lightweight library loaded by the [EAR SLURM Plugin](#ear-slurm-plugin) (through the `LD_PRELOAD` environment variable) that identifies the user application and loads its corresponding EAR Library distribution.
|
|
|
|
|
|
The Loader detects the underlying application, identifying the MPI version (if used) and other minor details.
|
|
|
With this information, the loader opens the suitable EAR Library version.
|
|
|
|
|
|
As can be read in the [EARL](#the-ear-library) page, depending on the MPI vendor the MPI types can be different, preventing any compatibility between distributions.
|
|
|
For example, if the MPI distribution is OpenMPI, the EAR Loader will load the EAR Library compiled with the OpenMPI includes.
|
|
|
|
|
|
You can read the [installation guide](Admin-guide#quick-installation-guide) for more information about compiling and installing different EARL versions.
|
|
|
|
|
|
## EAR SLURM plugin
|
|
|
|
|
|
EAR SLURM plugin allows to dynamically load and configure the EAR library for the SLURM jobs (and steps), if the flag `--ear=on` is set or if it is enabled by default.
|
|
|
Additionally, it reports any jobs that start or end to the nodes' EARDs for accounting and monitoring purposes.
|
|
|
|
|
|
### Configuration
|
|
|
|
|
|
Visit the [SLURM SPANK plugin section](Configuration#slurm-spank-plugin-configuration-file) on the configuration page to set up properly the SLURM `/etc/slurm/plugstack.conf` file.
|
|
|
|
|
|
You can find the complete list of EAR SLURM plugin accpeted parameters in the
|
|
|
[user guide](User-guide#ear-job-submission-flags). |