|
|
|
# Job monitoring and optimization with EAR
|
|
|
|
|
|
EAR was first designed to be usable 100% transparently by users, which means that
|
|
EAR was first designed to be usable 100% transparently by users, which means that
|
|
you can run your applications enabling/disabling/tuning EAR with the less effort
|
|
you can run your applications enabling/disabling/tuning EAR with the less effort
|
|
for changing your workflow, e.g., submission scripts.
|
|
for changing your workflow, e.g., submission scripts.
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
schedulers, which do all the effort to set-up EAR on job submission.
|
|
schedulers, which do all the effort to set-up EAR at job submission.
|
|
By now, **SLURM is the batch scheduler full compatible with EAR** thanks to EAR's SLURM
|
|
By now, **SLURM is the batch scheduler full compatible with EAR** thanks to EAR's SLURM
|
|
SPANK plug-in.
|
|
SPANK plug-in.
|
|
|
|
|
... | @@ -14,12 +16,20 @@ Check with the [`ear-info`](EAR-commands#ear-info) command if EARL is `on`/`off` |
... | @@ -14,12 +16,20 @@ Check with the [`ear-info`](EAR-commands#ear-info) command if EARL is `on`/`off` |
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
transparent job submission with EAR and default configuration.
|
|
transparent job submission with EAR and default configuration.
|
|
The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is no any official stable nor supported feature.
|
|
The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is not any official stable nor supported feature.
|
|
|
|
|
|
[[_TOC_]]
|
|
[[_TOC_]]
|
|
|
|
|
|
# Use cases
|
|
# Use cases
|
|
|
|
|
|
|
|
Since EAR was targetting computational applications, some applications are automatically loaded and others are not to avoid running EAR with, por exampl, sh processes. Types of applications automatically loaded with EAR library are:
|
|
|
|
|
|
|
|
- MPI applications (intel, OpenMPI Fujitsu and CRAY versions)
|
|
|
|
- Not MPI: OpenMP, CUDA, MKL, OneAPI
|
|
|
|
- Python
|
|
|
|
|
|
|
|
For other use cases it can explicitly requested, see (Other application types or frameworks)
|
|
|
|
|
|
## MPI applications
|
|
## MPI applications
|
|
|
|
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
... | @@ -429,7 +439,7 @@ If using SLURM as a job manager, a *sb* (sbatch) job-step is created with the da |
... | @@ -429,7 +439,7 @@ If using SLURM as a job manager, a *sb* (sbatch) job-step is created with the da |
|
from the entire execution.
|
|
from the entire execution.
|
|
A specific job may be specified with `-j` option.
|
|
A specific job may be specified with `-j` option.
|
|
|
|
|
|
```
|
|
```bash
|
|
[user@host EAR]$ eacct -j 175966
|
|
[user@host EAR]$ eacct -j 175966
|
|
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
|
|
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
|
|
175966-sb user afid NP 2 2.97/3.00/--- 3660.00 381.51 --- --- 2792619 --- --- --- --- --- ---
|
|
175966-sb user afid NP 2 2.97/3.00/--- 3660.00 381.51 --- --- 2792619 --- --- --- --- --- ---
|
... | @@ -444,7 +454,7 @@ For node-specific information, the `-l` (i.e., long) option provides detailed ac |
... | @@ -444,7 +454,7 @@ For node-specific information, the `-l` (i.e., long) option provides detailed ac |
|
In addition, `eacct` shows an additional column: `VPI(%)` (See the example below).
|
|
In addition, `eacct` shows an additional column: `VPI(%)` (See the example below).
|
|
The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
|
|
The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
|
|
|
|
|
|
```
|
|
```bash
|
|
[user@host EAR]$ eacct -j 175966 -l
|
|
[user@host EAR]$ eacct -j 175966 -l
|
|
JOB-STEP NODE ID USER ID APPLICATION AVG-F/IMC-F TIME(s) POWER(s) GBS CPI ENERGY(J) IO(MBS) MPI% VPI(%) G-POW(T/U) G-FREQ G-UTIL(G/M)
|
|
JOB-STEP NODE ID USER ID APPLICATION AVG-F/IMC-F TIME(s) POWER(s) GBS CPI ENERGY(J) IO(MBS) MPI% VPI(%) G-POW(T/U) G-FREQ G-UTIL(G/M)
|
|
175966-sb cmp2506 user afid 2.97/--- 3660.00 388.79 --- --- 1422970 --- --- --- --- --- ---
|
|
175966-sb cmp2506 user afid 2.97/--- 3660.00 388.79 --- --- 1422970 --- --- --- --- --- ---
|
... | @@ -465,7 +475,7 @@ Both aggregated and detailed accountings are available, as well as filtering. |
... | @@ -465,7 +475,7 @@ Both aggregated and detailed accountings are available, as well as filtering. |
|
When using along with `-l` or `-r` options, all metrics stored in the EAR Database are given.
|
|
When using along with `-l` or `-r` options, all metrics stored in the EAR Database are given.
|
|
Please, read the [commands section page](EAR-commands) to see which of them are available.
|
|
Please, read the [commands section page](EAR-commands) to see which of them are available.
|
|
|
|
|
|
```
|
|
```bash
|
|
[user@host EAR]$ eacct -j 175966.1 -r
|
|
[user@host EAR]$ eacct -j 175966.1 -r
|
|
JOB-STEP NODE ID ITER. POWER(W) GBS CPI GFLOPS/W TIME(s) AVG_F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
|
|
JOB-STEP NODE ID ITER. POWER(W) GBS CPI GFLOPS/W TIME(s) AVG_F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
|
|
175966-1 cmp2506 21 360.6 115.8 0.838 0.086 1.001 2.58 2.30 0.0 11.6 0.0 / 0.0 0.00 0%/0%
|
|
175966-1 cmp2506 21 360.6 115.8 0.838 0.086 1.001 2.58 2.30 0.0 11.6 0.0 / 0.0 0.00 0%/0%
|
... | @@ -475,7 +485,7 @@ Please, read the [commands section page](EAR-commands) to see which of them are |
... | @@ -475,7 +485,7 @@ Please, read the [commands section page](EAR-commands) to see which of them are |
|
175966-1 cmp2506 41 383.3 143.2 1.034 0.124 1.114 2.58 2.38 0.0 19.6 0.0 / 0.0 0.00 0%/0%
|
|
175966-1 cmp2506 41 383.3 143.2 1.034 0.124 1.114 2.58 2.38 0.0 19.6 0.0 / 0.0 0.00 0%/0%
|
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```bash
|
|
[user@host EAR]$ eacct -j 175966 -c test.csv
|
|
[user@host EAR]$ eacct -j 175966 -c test.csv
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
|
|
|
... | | ... | |