|
|
|
# Job monitoring and optimization with EAR
|
|
|
|
|
|
|
|
EAR was first designed to be 100% transparent by users, which means that
|
|
|
|
you can run your applications enabling/disabling/tuning EAR with minimal effort
|
|
|
|
EAR was first designed to be usable 100% transparently by users, which means that
|
|
|
|
you can run your applications enabling/disabling/tuning EAR with the less effort to
|
|
|
|
for changing your workflow, e.g., submission scripts.
|
|
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
|
|
schedulers, which do all the effort to set-up EAR at job submission.
|
|
|
|
By now, **[SLURM](https://slurm.schedmd.com/documentation.html) is the batch scheduler fully compatible with EAR** thanks to EAR's SLURM SPANK plug-in.
|
|
|
|
**[SLURM](https://slurm.schedmd.com/documentation.html) is the main batch scheduler fully compatible with EAR** thanks to EAR's SLURM SPANK plug-in.
|
|
|
|
|
|
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting
|
|
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library (EARL) is automatically
|
|
|
|
loaded with some applications when EAR is enabled by default.
|
|
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting a job with either `srun`, `sbatch` or `mpirun`.
|
|
|
|
|
|
|
|
EAR is also compatible with **[PBSPro](https://2025.help.altair.com/2025.2.0/PBS%20Professional/PBSReference2025.2.0.pdf)**, thorugh the EAR PBRPro Hook.
|
|
|
|
|
|
|
|
The EAR Library (EARL) is automatically loaded with some applications when EAR is enabled by default.
|
|
|
|
|
|
|
|
Check with the [`ear-info`](EAR-commands#ear-info) command if EARL is `on`/`off` by default.
|
|
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
| ... | ... | @@ -22,13 +24,13 @@ The EAR development team had worked also with OAR and PBSPro batch schedulers, b |
|
|
|
# Use cases
|
|
|
|
|
|
|
|
Since EAR was targetting computational applications, some applications are automatically loaded and others are not, avoiding running EAR with, for example, bash processes.
|
|
|
|
The following list resumes the application use cases where the EARL can be loaded transparently with them:
|
|
|
|
The following list summarizes the application use cases where the EARL can be loaded transparently with them:
|
|
|
|
|
|
|
|
- MPI applications: IntelMPI, OpenMPI, Fujitsu and CRAY versions.
|
|
|
|
- Non-MPI applications: OpenMP, CUDA, MKL and OneAPI.
|
|
|
|
- Python applications.
|
|
|
|
|
|
|
|
Other use cases not listed here might be still supported.
|
|
|
|
Other use cases not listed here might still be supported.
|
|
|
|
See the [dedicated section](#other-application-types-or-frameworks).
|
|
|
|
|
|
|
|
## MPI applications
|
| ... | ... | @@ -42,7 +44,7 @@ When using specific MPI flavour commands to start applications (e.g., `mpirun`, |
|
|
|
`mpiexec.hydra`), there are some key points which you must take account.
|
|
|
|
See [next sections](#using-mpirunmpiexec-command) for examples and more details.
|
|
|
|
|
|
|
|
Review SLURM's [MPI Users Guide](https://slurm.schedmd.com/mpi_guide.html), read your cluster documetation or ask your system administrator to see how SLURM is integrated with the MPI Library in your system.
|
|
|
|
Review SLURM's [MPI Users Guide](https://slurm.schedmd.com/mpi_guide.html), read your cluster documentation or ask your system administrator to see how SLURM is integrated with the MPI Library in your system.
|
|
|
|
|
|
|
|
### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
|
|
|
| ... | ... | @@ -54,7 +56,7 @@ EARL automatically supports this use case. |
|
|
|
EARL cannot detect automatically MPI symbols when some of these languages are used.
|
|
|
|
On that case, an environment variable is provided to give EARL a hint of the MPI flavour being used.
|
|
|
|
|
|
|
|
Export the [`EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_version) environment with the value from the following table depending on the MPI implementation you are loading:
|
|
|
|
Export [`EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_version) environment with the value from the following table depending on the MPI implementation you are loading:
|
|
|
|
|
|
|
|
| MPI flavour | Value |
|
|
|
|
| ----------- | ----- |
|
| ... | ... | @@ -101,7 +103,8 @@ You can read [here](https://www.intel.com/content/www/us/en/develop/documentatio |
|
|
|
For joining OpenMPI and EAR it is highly recommended to use SLURM's `srun` command.
|
|
|
|
When using `mpirun`, as OpenMPI is not fully coordinated with the scheduler, EARL
|
|
|
|
is not automatilly loaded on all nodes.
|
|
|
|
Therefore EARL will be disabled and only basic energy metrics will be reported.
|
|
|
|
Therefore EARL will be disabled and only basic energy metrics.
|
|
|
|
will be reported.
|
|
|
|
To provide support for this workflow, EAR provides [`erun`](EAR-commands#erun) command.
|
|
|
|
Read the corresponding [examples section](User-guide.md#openmpi-1) for more information about how to use this command.
|
|
|
|
|
| ... | ... | @@ -129,7 +132,7 @@ See [EAR submission flags](#ear-job-submission-flags) provided by EAR SLURM inte |
|
|
|
### OpenMP, CUDA, Intel MKL and OneAPI
|
|
|
|
|
|
|
|
To load EARL automatically with non-MPI applications it is required to have it compiled with dynamic symbols and also it must be executed with `srun` command.
|
|
|
|
For example, for CUDA applications you must use the `--cudart=shared` option at compile time.
|
|
|
|
For example, for CUDA applications the `--cudart=shared` option must be used at compile time.
|
|
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
|
|
|
|
|
|
## Other application types or frameworks
|
| ... | ... | @@ -199,14 +202,14 @@ mpirun -np 64 singularity exec $IMAGE $EAR_INSTALL_PATH/bin/erun \ |
|
|
|
--program=$BENCH_PATH/bt-mz.D.64
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that the example exports `APPTAINERENV_EAR_REPORT_ADD` to set the environment variable [`EAR_REPORT_ADD`](EAR-environment-variables#report-plug-ins) to load [`sysfs`](https://gitlab.bsc.es/ear_team/ear/-/wikis/Report#sysfs-report-plugin) report plug-in.
|
|
|
|
Note that the example exports `APPTAINERENV_EAR_REPORT_ADD` to set the environment variable [`EAR_REPORT_ADD`](EAR-environment-variables#report-plug-ins) to load [`sysfs`](https://gitlab.bsc.es/ear_team/ear_private/-/wikis/Report#sysfs-report-plugin) report plug-in.
|
|
|
|
See [next section](#runtime-report-plug-ins) about report plug-ins.
|
|
|
|
|
|
|
|
## Using EARL through the COMPSs Framework
|
|
|
|
|
|
|
|
COMP Superscalar ([COMPSs](https://compss-doc.readthedocs.io/en/latest/index.html)) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters.
|
|
|
|
COMPSs provides a programming interface for the development of the applications and a runtime system that exploits the inherent parallelism of applications at execution time.
|
|
|
|
**Since version 5.0 EAR supports monitoring and optimization of workflows** and the COMPSs Framework includes integration with EAR.
|
|
|
|
**Since version 5.0 EAR supports monitoring and optimization of workflows** and the COMPSs Framework includes the integration with EAR.
|
|
|
|
Check out the [dedicated section](https://compss-doc.readthedocs.io/en/latest/Sections/05_Tools/05_EAR.html#) from the official COMPSs documentation for more information about how to measure the energy consumption of your workflows.
|
|
|
|
|
|
|
|
EARL loading is **only available** using `enqueue_compss` and with Python applications.
|
| ... | ... | @@ -221,7 +224,7 @@ get to see or know your applications workload. |
|
|
|
The Library is doted with several modules and options to be able to provide different
|
|
|
|
kind of information.
|
|
|
|
|
|
|
|
As a very simple hint of your application workload, you can enable EARL verbosity (i.e., `--ear-verbose=1`) to get loop data at runtime.
|
|
|
|
As a very simple hint of your application's workload, you can enable EARL verbosity (i.e., `--ear-verbose=1`) to get loop data at runtime.
|
|
|
|
**The information is shown at _stderr_ by default.**
|
|
|
|
Read how to set up verbosity at [submission time](#ear-job-submission-flags) and
|
|
|
|
[verbosity environment variables](EAR-environment-variables#verbosity) provided
|
| ... | ... | @@ -229,8 +232,9 @@ for a more advanced tuning of this EAR feature. |
|
|
|
|
|
|
|
## Post-mortem application data
|
|
|
|
|
|
|
|
To get offline job data EAR provides [**eacct**](EAR-commands#ear-job-accounting-eacct) command, a tool to provide the monitored job data stored in the Database.
|
|
|
|
You can request information in different ways, so you can read aggregated job data, per-node or per-loop information among other things.
|
|
|
|
To get offline EAR job data you can use the [**eacct**](EAR-commands#ear-job-accounting-eacct) command, a tool to provide the monitored job data stored in the EAR Database.
|
|
|
|
You can request information in different ways.
|
|
|
|
Thus, you can read either per-node or aggregated job datas averaged along the execution time or get metrics collected at runtime.
|
|
|
|
See [eacct usage examples](User-guide#ear-job-accounting-eacct) for a better overview of what `eacct` provides.
|
|
|
|
|
|
|
|
## Runtime report plug-ins
|
| ... | ... | @@ -243,7 +247,7 @@ You can load this plug-in in two ways: |
|
|
|
1. By setting [`--ear-user-db`](#ear-job-submission-flags) flag at submission time.
|
|
|
|
2. [Loading directly the report plug-in](EAR-environment-variables#ear_report_add) through an environment variable: `export EAR_REPORT_ADD=csv_ts.so`.
|
|
|
|
|
|
|
|
Read [report plug-ins](https://gitlab.bsc.es/ear_team/ear/-/wikis/Report) dedicated section for more information.
|
|
|
|
Read [report plug-ins](https://gitlab.bsc.es/ear_team/ear_private/-/wikis/Report) dedicated section for more information.
|
|
|
|
|
|
|
|
## Other EARL events
|
|
|
|
|
| ... | ... | @@ -361,7 +365,7 @@ variable and one file per node will be generated only with EAR output. |
|
|
|
The environemnt variable must be set with the path (a directory) where you want
|
|
|
|
the output files to be generated, it will be automatically created if needed.
|
|
|
|
|
|
|
|
> You can always check the avaiable EAR submission flags provided by EAR's SLURM SPANK
|
|
|
|
> You can always check the available EAR submission flags provided by EAR's SLURM SPANK
|
|
|
|
plug-in by typing `srun --help`.
|
|
|
|
|
|
|
|
## CPU frequency selection
|
| ... | ... | @@ -452,7 +456,7 @@ identification. |
|
|
|
#SBATCH -N 1
|
|
|
|
#SBATCH -e test.%j.err
|
|
|
|
#SBATCH -o test.%j.out
|
|
|
|
#SBTACH --ntasks=24
|
|
|
|
#SBATCH --ntasks=24
|
|
|
|
#SBATCH --tasks-per-node=24
|
|
|
|
#SBATCH --cpus-per-task=1
|
|
|
|
#SBATCH --ear-verbose=1
|
| ... | ... | |