|
|
# Job monitoring and optimization with EAR
|
|
|
|
|
|
EAR was first designed to be usable 100% transparently by users, which means that
|
|
|
you can run your applications enabling/disabling/tuning EAR with the less effort
|
|
|
for changing your workflow, e.g., submission scripts.
|
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
|
schedulers, which do all the effort to set-up EAR on job submission.
|
|
|
schedulers, which do all the effort to set-up EAR at job submission.
|
|
|
By now, **SLURM is the batch scheduler full compatible with EAR** thanks to EAR's SLURM
|
|
|
SPANK plug-in.
|
|
|
|
... | ... | @@ -14,12 +16,20 @@ Check with the [`ear-info`](EAR-commands#ear-info) command if EARL is `on`/`off` |
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
|
transparent job submission with EAR and default configuration.
|
|
|
The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is no any official stable nor supported feature.
|
|
|
The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is not any official stable nor supported feature.
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
# Use cases
|
|
|
|
|
|
Since EAR was targetting computational applications, some applications are automatically loaded and others are not to avoid running EAR with, por exampl, sh processes. Types of applications automatically loaded with EAR library are:
|
|
|
|
|
|
- MPI applications (intel, OpenMPI Fujitsu and CRAY versions)
|
|
|
- Not MPI: OpenMP, CUDA, MKL, OneAPI
|
|
|
- Python
|
|
|
|
|
|
For other use cases it can explicitly requested, see (Other application types or frameworks)
|
|
|
|
|
|
## MPI applications
|
|
|
|
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
... | ... | @@ -429,7 +439,7 @@ If using SLURM as a job manager, a *sb* (sbatch) job-step is created with the da |
|
|
from the entire execution.
|
|
|
A specific job may be specified with `-j` option.
|
|
|
|
|
|
```
|
|
|
```bash
|
|
|
[user@host EAR]$ eacct -j 175966
|
|
|
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
|
|
|
175966-sb user afid NP 2 2.97/3.00/--- 3660.00 381.51 --- --- 2792619 --- --- --- --- --- ---
|
... | ... | @@ -444,7 +454,7 @@ For node-specific information, the `-l` (i.e., long) option provides detailed ac |
|
|
In addition, `eacct` shows an additional column: `VPI(%)` (See the example below).
|
|
|
The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
|
|
|
|
|
|
```
|
|
|
```bash
|
|
|
[user@host EAR]$ eacct -j 175966 -l
|
|
|
JOB-STEP NODE ID USER ID APPLICATION AVG-F/IMC-F TIME(s) POWER(s) GBS CPI ENERGY(J) IO(MBS) MPI% VPI(%) G-POW(T/U) G-FREQ G-UTIL(G/M)
|
|
|
175966-sb cmp2506 user afid 2.97/--- 3660.00 388.79 --- --- 1422970 --- --- --- --- --- ---
|
... | ... | @@ -465,7 +475,7 @@ Both aggregated and detailed accountings are available, as well as filtering. |
|
|
When using along with `-l` or `-r` options, all metrics stored in the EAR Database are given.
|
|
|
Please, read the [commands section page](EAR-commands) to see which of them are available.
|
|
|
|
|
|
```
|
|
|
```bash
|
|
|
[user@host EAR]$ eacct -j 175966.1 -r
|
|
|
JOB-STEP NODE ID ITER. POWER(W) GBS CPI GFLOPS/W TIME(s) AVG_F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
|
|
|
175966-1 cmp2506 21 360.6 115.8 0.838 0.086 1.001 2.58 2.30 0.0 11.6 0.0 / 0.0 0.00 0%/0%
|
... | ... | @@ -475,7 +485,7 @@ Please, read the [commands section page](EAR-commands) to see which of them are |
|
|
175966-1 cmp2506 41 383.3 143.2 1.034 0.124 1.114 2.58 2.38 0.0 19.6 0.0 / 0.0 0.00 0%/0%
|
|
|
```
|
|
|
|
|
|
```
|
|
|
```bash
|
|
|
[user@host EAR]$ eacct -j 175966 -c test.csv
|
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
|
|
... | ... | |