... | ... | @@ -3,23 +3,24 @@ you can run your applications enabling/disabling/tuning EAR with the less effort |
|
|
for changing your workflow, e.g., submission scripts.
|
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
|
schedulers, which do all the effort to set-up EAR on job submission.
|
|
|
By now, SLURM is the batch scheduler 100% compatible by EAR thanks to EAR's SLURM
|
|
|
By now, **SLURM is the batch scheduler full compatible with EAR** thanks to EAR's SLURM
|
|
|
SPANK plug-in.
|
|
|
|
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting
|
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library (EARL) is automatically
|
|
|
loaded with some applications when EAR is enabled by default.
|
|
|
|
|
|
Check with the `ear-info` command if EARL is `on`/`off` by default.
|
|
|
Check with the [`ear-info`](EAR-commands#ear-info) command if EARL is `on`/`off` by default.
|
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
|
transparent job submission with EAR and default configuration.
|
|
|
The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is no any official stable nor supported feature.
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
# Use cases
|
|
|
## Use cases
|
|
|
|
|
|
## MPI applications
|
|
|
### MPI applications
|
|
|
|
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
|
|
default (check `ear-info`). EAR supports the utilization of both
|
... | ... | @@ -31,12 +32,12 @@ When using specific MPI flavour commands to start applications (e.g., `mpirun`, |
|
|
`mpiexec.hydra`), there are some keypoints which you must take account.
|
|
|
See [next sections](#using-mpirunmpiexec-command) for examples and more details.
|
|
|
|
|
|
### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
|
#### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
|
|
|
|
EARL automatically supports this use case.
|
|
|
`mpirun`/`mpiexec` and `srun` are supported in the same manner as explained above.
|
|
|
|
|
|
### Python MPI applications
|
|
|
#### Python MPI applications
|
|
|
|
|
|
EARL cannot detect automatically MPI symbols when Python is used.
|
|
|
On that case, an environment variable used to specify which MPI flavour is provided.
|
... | ... | @@ -45,9 +46,9 @@ Export [`SLURM_EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_ver |
|
|
values, e.g., `export SLURM_EAR_LOAD_MPI_VERSION="open mpi"`, whose are the two MPI
|
|
|
implementations 100% supported by EAR.
|
|
|
|
|
|
### Running MPI applications on SLURM systems
|
|
|
#### Running MPI applications on SLURM systems
|
|
|
|
|
|
#### Using `srun` command
|
|
|
##### Using `srun` command
|
|
|
|
|
|
Running MPI applications with EARL on SLURM systems using `srun` command is the most
|
|
|
straightforward way to start using EAR.
|
... | ... | @@ -61,12 +62,12 @@ They are provided by EAR's SLURM SPANK plug-in. |
|
|
When using SLURM commands for job submission, both Intel and OpenMPI implementations are
|
|
|
supported.
|
|
|
|
|
|
#### Using `mpirun`/`mpiexec` command
|
|
|
##### Using `mpirun`/`mpiexec` command
|
|
|
|
|
|
To provide an automatic loading of the EAR library, the only requirement from
|
|
|
the MPI library is to be coordinated with the scheduler.
|
|
|
|
|
|
##### Intel MPI
|
|
|
###### Intel MPI
|
|
|
|
|
|
Recent versions of Intel MPI offers two environment variables that can be used
|
|
|
to guarantee the correct scheduler integrations:
|
... | ... | @@ -78,7 +79,7 @@ These arguments are passed to SLURM, and they can be all the same as EAR's SPANK |
|
|
|
|
|
You can read [here](https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/hydra-environment-variables.html) the Intel environment variables guide.
|
|
|
|
|
|
##### OpenMPI
|
|
|
###### OpenMPI
|
|
|
|
|
|
For joining OpenMPI and EAR it is highly recommended to use SLURM's `srun` command.
|
|
|
When using `mpirun`, as OpenMPI is not fully coordinated with the scheduler, EARL
|
... | ... | @@ -88,7 +89,7 @@ will be reported. |
|
|
To provide support for this workflow, EAR provides [`erun`](EAR-commands#erun) command.
|
|
|
Read the corresponding [examples section](User-guide.md#openmpi-1) for more information about how to use this command.
|
|
|
|
|
|
##### MPI4PY
|
|
|
###### MPI4PY
|
|
|
|
|
|
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify
|
|
|
the application as Intel or OpenMPI. In order to specify it, the user has
|
... | ... | @@ -96,9 +97,9 @@ to define the `SLURM_LOAD_MPI_VERSION` environment variable with the values _int |
|
|
_open mpi_. It is recommended to add in Python modules to make it easy for
|
|
|
final users.
|
|
|
|
|
|
## Non-MPI applications
|
|
|
### Non-MPI applications
|
|
|
|
|
|
### Python
|
|
|
#### Python
|
|
|
|
|
|
Since version 4.1 EAR automatically executes the Library with Python applications,
|
|
|
so no action is needed.
|
... | ... | @@ -106,14 +107,14 @@ You must run the application with `srun` command to pass through the EAR's SLURM |
|
|
SPANK plug-in in order to enable/disable/tuning EAR.
|
|
|
See [EAR submission flags](#ear-job-submission-flags) provided by EAR SLURM integration.
|
|
|
|
|
|
### OpenMP, CUDA and Intel MKL
|
|
|
#### OpenMP, CUDA and Intel MKL
|
|
|
|
|
|
To load EARL automatically with non-MPI applications it is required to have it compiled
|
|
|
with dynamic symbols and also it must be executed with `srun` command.
|
|
|
For example, for CUDA applications the `--cudart=shared` option must be used at compile time.
|
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
|
|
|
|
## Other application types or frameworks
|
|
|
### Other application types or frameworks
|
|
|
|
|
|
For other programming models or sequential apps not supported by default, EARL can
|
|
|
be forced to be loaded by setting [`SLURM_EAR_LOADER_APPLICATION`](EAR-environment-variables#ear_loader_application)
|
... | ... | @@ -127,7 +128,7 @@ export SLURM_EAR_LOADER_APPLICATION=my_app |
|
|
srun my_app
|
|
|
```
|
|
|
|
|
|
# Retrieving EAR data
|
|
|
## Retrieving EAR data
|
|
|
|
|
|
As a job accounting and monitoring tool, EARL collects some metrics that you can
|
|
|
get to see or know your applications behaviour.
|
... | ... | @@ -184,7 +185,7 @@ how to generate Paraver traces. |
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want to get more details
|
|
|
about how to deal with EAR data with Paraver.
|
|
|
|
|
|
# EAR job submission flags
|
|
|
## EAR job submission flags
|
|
|
|
|
|
The following EAR options can be specified when running `srun` and/or `sbatch`,
|
|
|
and are supported with `srun`/`sbatch`/`salloc`:
|
... | ... | @@ -216,7 +217,7 @@ the output files to be generated, it will be automatically created if needed. |
|
|
> You can always check the avaiable EAR submission flags provided by EAR's SLURM SPANK
|
|
|
plug-in by typing `srun --help`.
|
|
|
|
|
|
## CPU frequency selection
|
|
|
### CPU frequency selection
|
|
|
|
|
|
The [EAR configuration file](www.example.org) supports the specification of *EAR authorized users*,
|
|
|
who can ask for a more privileged submission options. The most relevant ones are the possibility
|
... | ... | @@ -226,7 +227,7 @@ with sysadmin or helpdesk team to become an authorized user. |
|
|
- The `--ear-policy=policy_name` flag asks for _policy_name_ policy. Type `srun --help` to see policies currently installed in your system.
|
|
|
- The `--ear-cpufreq=value` (_value_ must be given in kHz) asks for a specific CPU frequency.
|
|
|
|
|
|
## GPU frequency selection
|
|
|
### GPU frequency selection
|
|
|
|
|
|
EAR version 3.4 and upwards supports GPU monitoring for NVIDIA devices from the
|
|
|
point of view of the application and node monitoring. GPU frequency optimization
|
... | ... | @@ -241,9 +242,9 @@ To see the list of available frequencies of the GPU you will work on, you can ty |
|
|
nvidia-smi -q -d SUPPORTED_CLOCKS
|
|
|
```
|
|
|
|
|
|
# Examples
|
|
|
## Examples
|
|
|
|
|
|
## `srun` examples
|
|
|
### `srun` examples
|
|
|
|
|
|
Having an MPI application asking for one node and 24 tasks, the following is a
|
|
|
simple case of job submission.
|
... | ... | @@ -287,7 +288,7 @@ srun --ear-cpufreq=2000000 --ear-policy=monitoring --ear-verbose=1 -J test -N 1 |
|
|
|
|
|
For `--ear-cpufreq` to have any effect, you must specify the `--ear-policy` option even if you want to run your application with the default policy.
|
|
|
|
|
|
## `sbatch` + EARL + srun
|
|
|
### `sbatch` + EARL + srun
|
|
|
|
|
|
When using `sbatch` EAR options can be specified in the same way. If more than one
|
|
|
`srun` is included in the job submission, EAR options can be inherited from `sbatch` to the different `srun` instances or they can be specifically modified on each individual `srun`.
|
... | ... | @@ -317,9 +318,9 @@ mkdir ear_metrics |
|
|
srun --ear-user-db=ear_metrics/app_metrics application
|
|
|
```
|
|
|
|
|
|
## EARL + `mpirun`
|
|
|
### EARL + `mpirun`
|
|
|
|
|
|
### Intel MPI
|
|
|
#### Intel MPI
|
|
|
|
|
|
When running EAR with `mpirun` rather than `srun`, we have to specify the utilization of `srun` as bootstrap. Version 2019 and newer offers two environment variables for bootstrap server specification and arguments.
|
|
|
```
|
... | ... | @@ -328,7 +329,7 @@ export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--ear-policy=monitoring --ear-verb |
|
|
mpiexec.hydra -n 10 application
|
|
|
```
|
|
|
|
|
|
### OpenMPI
|
|
|
#### OpenMPI
|
|
|
|
|
|
Bootstrap is an Intel(R) MPI option but not an OpenMPI option. For OpenMPI
|
|
|
`srun` must be used for an automatic EAR support.
|
... | ... | @@ -354,7 +355,7 @@ time per job or node, like SLURM does with its plugins. |
|
|
**IMPORTANT NOTE** If you are going to launch `n` applications with `erun` command through a sbatch job, you must set the environment variable `SLURM_STEP_ID` to values from `0` to `n-1` before each `mpirun` call.
|
|
|
By this way `erun` will inform the EARD the correct step ID to be stored then to the Database.
|
|
|
|
|
|
# EAR job Accounting (`eacct`)
|
|
|
## EAR job Accounting (`eacct`)
|
|
|
|
|
|
The [`eacct`](EAR-commands#ear-job-accounting-eacct) command shows accounting information stored in the EAR DB for
|
|
|
jobs (and steps) IDs.
|
... | ... | @@ -362,7 +363,7 @@ The command uses EAR's configuration file to determine if the user running it is |
|
|
privileged or not, as **non-privileged users can only access their information**.
|
|
|
It provides the following options.
|
|
|
|
|
|
## Usage examples
|
|
|
### Usage examples
|
|
|
|
|
|
The basic usage of `eacct` retrieves the last 20 applications (by default) of the
|
|
|
user executing it.
|
... | ... | @@ -427,11 +428,11 @@ Please, read the [commands section page](EAR-commands) to see which of them are |
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
|
```
|
|
|
|
|
|
# Job energy optimization: EARL policies
|
|
|
## Job energy optimization: EARL policies
|
|
|
|
|
|
The core component of EAR at the user's job level is the EAR Library (EARL).
|
|
|
The Library deals with job monitoring and is the component which implements and applies
|
|
|
optimization policies based on monitored workload.
|
|
|
|
|
|
We highly recommend you to read [EARL](EARL) documentation and also how energy policies work
|
|
|
in order to better understand what is doing the Library internally, so you will can explore easily all features (e.g., tunning variables, collecting data) EAR offers to the end-user so you will have more knowledge about how much resources your application consumes and how to correlate with its computational characteristics. |
|
|
\ No newline at end of file |
|
|
in order to better understand what is doing the Library internally, so you will can explore easily all features (e.g., tunning variables, collecting data) EAR offers to the end-user so you will have more knowledge about how much resources your application consumes and how to correlate with its computational characteristics. |