... | ... | @@ -4,6 +4,7 @@ EAR offers the following commands: |
|
|
- Commands to control and temporally modify cluster settings: [econtrol](#energy-control-econtrol).
|
|
|
- Commands to create/update/clean the DB: [edb_create](#edb_create), [edb_clean_pm](#edb_clean_pm) and [edb_clean_apps](#edb_clean_apps).
|
|
|
- A command to run OpenMPI applications with EAR on SLURM systems through `mpirun` command: [erun](#erun).
|
|
|
- A command to show current EAR installation information: [ear-info](#ear-info).
|
|
|
|
|
|
Commands belonging to the first three categories read the EAR configurarion file
|
|
|
(`ear.conf`) to determine whether the user is authorized, as some of them has some
|
... | ... | @@ -15,7 +16,7 @@ Some options are disabled when the user is not authorized. |
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
# EAR job Accounting (eacct)
|
|
|
## EAR job Accounting (eacct)
|
|
|
|
|
|
The `eacct` command shows accounting information stored in the EAR DB for jobs
|
|
|
(and step) IDs.
|
... | ... | @@ -165,7 +166,7 @@ at the CSV file. |
|
|
However, the iteration time (in seconds) is present on each loop as *ITER_TIME_SEC*,
|
|
|
as well as a timestamp (i.e., *TIMESTAMP*) with the elapsed time in seconds since the EPOCH.
|
|
|
|
|
|
# EAR system energy Report (ereport)
|
|
|
## EAR system energy Report (ereport)
|
|
|
|
|
|
The ereport command creates reports from the energy accounting data from nodes stored in the EAR DB. It is intended to use for energy consumption analysis over a set period of time, with some additional (optional) criteria such as node name or username.
|
|
|
|
... | ... | @@ -189,7 +190,7 @@ Options are as follows: |
|
|
-h shows this message.
|
|
|
```
|
|
|
|
|
|
## Examples
|
|
|
### Examples
|
|
|
|
|
|
The following example uses the 'all' nodes option to display information for each node, as well as a start_time so it will give the accumulated energy from that moment until the current time.
|
|
|
|
... | ... | @@ -231,7 +232,7 @@ Energy% Warning lvl Timestamp INC th p_state ENERGY T1 |
|
|
111.554 0 2019-05-22 09:41:34 0 0 837 1012019 907200 600 604800 EnergyBudget
|
|
|
```
|
|
|
|
|
|
## EAR Control (econtrol)
|
|
|
### EAR Control (econtrol)
|
|
|
|
|
|
The `econtrol` command modifies cluster settings (temporally) related to power policy settings.
|
|
|
These options are sent to all the nodes in the cluster.
|
... | ... | @@ -291,9 +292,9 @@ Node id Job-Step M-Rank DC power CPI GBS Gflops Time Avg |
|
|
node3 6878-0 1 245.44 0.37 24.29 136.40 56.00 2.59
|
|
|
```
|
|
|
|
|
|
# Database commands
|
|
|
## Database commands
|
|
|
|
|
|
## edb_create
|
|
|
### edb_create
|
|
|
|
|
|
Creates the EAR DB used for accounting and for the global energy control. Requires root access to the MySQL server. It reads the `ear.conf` to get connection details (server IP and port), DB name (which may or may not have been previously created) and EAR's default users (which will be created or altered to have the necessary privileges on EAR's database).
|
|
|
|
... | ... | @@ -305,7 +306,7 @@ Usage:edb_create [options] |
|
|
-h Shows this message.
|
|
|
```
|
|
|
|
|
|
## edb_clean_pm
|
|
|
### edb_clean_pm
|
|
|
|
|
|
Cleans periodic metrics from the database. Used to reduce the size of EAR's database, it will remove every Periodic_metrics entry older than `num_days`:
|
|
|
|
... | ... | @@ -319,7 +320,7 @@ Usage:./src/commands/edb_clean_pm [options] |
|
|
-v Show current EAR version.
|
|
|
```
|
|
|
|
|
|
## edb_clean_apps
|
|
|
### edb_clean_apps
|
|
|
|
|
|
Removes applications from the database. It is intended to remove old applications to speed up queries and free up space. It can also be used to remove specific applications from database. It removes ALL the information related to those jobs (the following tables will be modified for each job: Loops, if they exist; GPU_signatures, if they exist; Signatures, if they exist; Power signatures, Applications, and Jobs).
|
|
|
|
... | ... | @@ -338,7 +339,7 @@ Usage:edb_clean_apps [-j/-d] [options] |
|
|
-h Displays this message
|
|
|
```
|
|
|
|
|
|
# erun
|
|
|
## erun
|
|
|
|
|
|
`erun` is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline.
|
|
|
It was designed to provide compatibility between MPI implementations not fully compatible with
|
... | ... | @@ -386,4 +387,78 @@ Also you have to load the EAR environment module or define its environment varia |
|
|
| EAR_INSTALL_PATH=\<path\> | prefix=\<path\> |
|
|
|
| EAR_TMP=\<path\> | localstatedir=\<path\> |
|
|
|
| EAR_ETC=\<path\> | sysconfdir=\<path\> |
|
|
|
| EAR_DEFAULT=\<on/off\> | default=<on/off\> | |
|
|
\ No newline at end of file |
|
|
| EAR_DEFAULT=\<on/off\> | default=<on/off\> |
|
|
|
|
|
|
## ear-info
|
|
|
|
|
|
`ear-info` is a tool created to quickly view useful information about the current EAR installation of the system.
|
|
|
It shows relevant details for both users and administrators, such as configuration defaults, installation paths, etc.
|
|
|
|
|
|
```
|
|
|
[user@hostname ~]$ ear-info -h
|
|
|
Usage: ear-info [options]
|
|
|
--node-conf[=nodename]
|
|
|
--help
|
|
|
```
|
|
|
|
|
|
The tool prints out information without giving it any argument.
|
|
|
It shows a resume about EAR parameters set at compile time, as well as some installation dependent configuration:
|
|
|
- The current EAR version.
|
|
|
- The maximum number of CPUs/processors supported.
|
|
|
- The maximum number of sockets supported.
|
|
|
- Whether the current installation provides support for GPUs.
|
|
|
- The default optimization policy.
|
|
|
- Whether the EAR Library is enabled by default on job submission.
|
|
|
- Information about EAR's Uncore Frequency Scaling policy (eUFS) configuration.
|
|
|
- EAR's dynamic load balancing policy.
|
|
|
- EAR's application phase classification.
|
|
|
- EAR's MPI stats collection feature.
|
|
|
- EAR data reporting mechanism configuration.
|
|
|
|
|
|
Below there is an example of the output:
|
|
|
|
|
|
```
|
|
|
EAR version 4.3
|
|
|
Max CPUS supported set to 256
|
|
|
Max sockets supported set to 4
|
|
|
EAR installed with GPU support MAX_GPUS 8
|
|
|
Default cluster policy is monitoring
|
|
|
EAR optimization by default set to 0
|
|
|
|
|
|
|
|
|
Environment configuration section..............
|
|
|
eUFS 1
|
|
|
eUFS limit 0.02
|
|
|
Load balanced enabled 1
|
|
|
Load Balance th 0.80
|
|
|
Use turbo for critical path 1
|
|
|
Use turbo 0
|
|
|
Exclusive mode 0
|
|
|
Use EARL phases 1
|
|
|
Use energy models 1
|
|
|
Max IMC frequency (0 = not defined) 0
|
|
|
Min IMC frequency (0 = not defined) 0
|
|
|
GPU frequency/pstate (0 = max GPU freq) 0
|
|
|
MPI optimization 0
|
|
|
MPI statistics 0
|
|
|
App. Tracer no trace
|
|
|
App. Extra report plugins no extra plugins
|
|
|
App. reporting loops to EARD 1
|
|
|
............................................
|
|
|
|
|
|
|
|
|
HACK section............................
|
|
|
Install path /hpc/base/ctt/packages/EAR/ear
|
|
|
Energy optimization policy
|
|
|
GPU power policy /hpc/base/ctt/packages/EAR/ear/lib/plugins/policies/gpu_monitoring.so
|
|
|
CPU power model /hpc/base/ctt/packages/EAR/ear/lib/plugins/policies/gpu_monitoring.so
|
|
|
CPU shared power model /hpc/base/ctt/packages/EAR/ear/lib/plugins/models/cpu_power_model_default.so
|
|
|
............................................
|
|
|
```
|
|
|
|
|
|
EAR was designed to be installed on heterogeneous systems, so there are some configuration parameters that are applied to a set of nodes identified by different tags.
|
|
|
The `--node-conf` flag can be used to request additional information about a specific node.
|
|
|
Configuration related to EAR's power capping sub-system, default optimization policies configuration and other parameters associated with the node requested are retrieved.
|
|
|
You can read the [EAR configuration section](Configuration) for more details about how EAR uses tags to identify and configure different kind of nodes on a given heterogeneous system.
|
|
|
|
|
|
Contact with ear-support@bsc.es for more information about the nomenclature used by `ear-info`'s output. |
|
|
\ No newline at end of file |