|
Running applications with EAR
|
|
Running applications with EAR
|
|
------------------------------
|
|
------------------------------
|
|
|
|
|
|
With EAR's SLURM plugin, running an application with EAR is as easy as submitting a job with either `srun`, `sbatch` or `mpirun` with SLURM. There are multiple configuration settings that can be set to customize EAR's behaviour, which are explained below as well as examples on how to run applications with each method.
|
|
With EAR's SLURM plugin, running an application with EAR is as easy as submitting a job with either `srun`, `sbatch` or `mpirun` with SLURM. There are multiple configuration settings that can be set to customize EAR's behaviour which are explained below, as well as examples on how to run applications with each method.
|
|
|
|
|
|
|
|
For other schedulers a simple prolog/epilog command can be created to provide transparent job submission with EAR and default configuration.
|
|
|
|
|
|
## Job submission with EAR
|
|
|
|
|
|
## Job submission with EAR and SLURM
|
|
The following EAR options can be specified when running `srun` and/or `sbatch`, and are supported with `srun`/`sbatch`/`salloc`:
|
|
The following EAR options can be specified when running `srun` and/or `sbatch`, and are supported with `srun`/`sbatch`/`salloc`:
|
|
|
|
|
|
| Options | Description |
|
|
| Options | Description |
|
|
| -------------------------- | -------------------------------------------------------------------- |
|
|
| -------------------------- | -------------------------------------------------------------------- |
|
|
| --ear=on/off(**) | Enables/disables EAR library. |
|
|
| --ear=on/off(**) | Enables/disables EAR library loading with this job. |
|
|
| --ear-policy=policy | Selects an energy policy for EAR. See the [Policies page](EAR-policies) for more info |
|
|
| --ear-policy=policy | Selects an energy policy for EAR. See the [Policies page](EAR-policies) for more info |
|
|
| --ear-cpufreq=frequency(*) | Specifies the starting frequency to be used by the chosen EAR policy (in KHz). |
|
|
| --ear-cpufreq=frequency(*) | Specifies the starting frequency to be used by the chosen EAR policy (in KHz). |
|
|
| --ear-policy-th=value(*) | Specifies the ear_threshold to be used by the chosen EAR policy {`value=[0...1]`}. |
|
|
| --ear-policy-th=value(*) | Specifies the ear_threshold to be used by the chosen EAR policy {`value=[0...1]`}. |
|
|
| --ear-user-db=file | Specifies the files where the user applications' metrics summary will be stored {'file.nodename.csv'}. If not defined, these files will not be created. |
|
|
| --ear-user-db=file | Specifies the files where the user applications' metrics summary will be stored {'file.nodename.csv'}. If not defined, these files will not be created. |
|
|
| --ear-verbose=value | Specifies the level of verbosity {value=[0...2]}; the default is 0. |
|
|
| --ear-verbose=value | Specifies the level of verbosity {value=[0...1]}; the default is 0. |
|
|
| --ear-tag=tag | Selects an energy tag. |
|
|
| --ear-tag=tag | Selects an energy tag. |
|
|
| --ear-learning=p_state(*) | Enables the learning phase for a given P_STATE {`p_state=[1...n]`}. |
|
|
| --ear-learning=p_state(*) | Enables the learning phase for a given P_STATE {`p_state=[1...n]`}. |
|
|
|
|
|
... | @@ -23,29 +25,37 @@ The following EAR options can be specified when running `srun` and/or `sbatch`, |
... | @@ -23,29 +25,37 @@ The following EAR options can be specified when running `srun` and/or `sbatch`, |
|
(*) Option requires _ear privileges_ to be used.
|
|
(*) Option requires _ear privileges_ to be used.
|
|
(**) Does not require _ear privileges_ but values might be limited by EAR configuration.
|
|
(**) Does not require _ear privileges_ but values might be limited by EAR configuration.
|
|
|
|
|
|
|
|
## GPU support
|
|
|
|
EAR version 3.4 and upwards supports GPU monitoring for NVIDIA devices from the point of view of the application and node monitoring. GPU frequency optimization is not yet supported. Authorized users can ask for a specific GPU frequency by setting the SLURM_EAR_GPU_DEF_FREQ environment variable. Only one frequency for all GPUs is now supported.
|
|
|
|
|
|
|
|
## EAR library loading
|
|
|
|
EAR uses the EAR loader to automatically select the EAR optimization library version. This optimization library is automatically loaded when either an MPI, OpenMP, MKL or CUDA application is detected. Application identification is done based on symbols detection. I doesn't work for static symbols.
|
|
|
|
|
|
## MPI versions supported
|
|
## MPI versions supported
|
|
|
|
|
|
When using sbacth/srun or salloc, Intel MPI and OpenMPI are 100% supported. When using mpi commands to start applications (mpirun, mpiexec.hydra etc), There are minor differences explained in the following examples.
|
|
When using sbacth/srun or salloc, Intel MPI and OpenMPI are 100% supported. When using mpi commands to start applications (mpirun, mpiexec.hydra, etc.), there are minor differences explained in examples below.
|
|
|
|
|
|
## Examples
|
|
## Examples
|
|
|
|
|
|
### `srun` examples
|
|
### `srun` examples
|
|
|
|
|
|
EAR plugin reads `srun` options and contacts with EARD. Invalid options are filtered to default values, so behaviour depends on system configuration.
|
|
EAR plugin reads `srun` options and contacts with EARD. Invalid options are filtered to default values, so behaviour will depend on system configuration.
|
|
|
|
|
|
- Executes application with EAR on/off (depending on the configuration) with default values:
|
|
- Executes an application with EAR on/off (depending on the configuration) with default values:
|
|
```
|
|
```
|
|
srun -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
|
- Executes application with EAR on with default values (policy, default frequency,etc) and verbose set to 1:
|
|
- Executes an application with EAR on with default values (policy, default frequency, etc.) and verbose set to 1:
|
|
```
|
|
```
|
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
|
- Executes application with EAR on and verbose set to 1. It user is authorised, job will be executed at 2.0GHz as default frequency and with power policy set to monitoring. Otherwise, default values will be applied:
|
|
EARL verbose messages are generated in the stderr. For jobs using more than 2 or 3 nodes messages can be overwritten. If the user wants to have EARL messages in a file the SLURM_EARL_VERBOSE_PATH environment variable must be set with a folder name. One file per node will be generated with EARL messages.
|
|
|
|
|
|
|
|
- Executes an application with EAR on and verbose set to 1. If user is authorised, job will be executed at 2.0GHz as default frequency and with power policy set to monitoring. Otherwise, default values will be applied:
|
|
```
|
|
```
|
|
srun --ear-cpufreq=2000000 --ear-policy=monitoring --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun --ear-cpufreq=2000000 --ear-policy=monitoring --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
|
- Executes application with EAR. If users is authorised to select the “memory-intensive” tag, its application will be executed according to the definition of the tag in the EAR configuration:
|
|
- Executes an application with EAR. If user is authorised to select the “memory-intensive” tag, the application will be executed according to the definition of the tag in the EAR configuration:
|
|
```
|
|
```
|
|
srun --ear-tag=memory-intensive --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun --ear-tag=memory-intensive --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
... | @@ -71,7 +81,7 @@ srun --ear-policy=monitoring application |
... | @@ -71,7 +81,7 @@ srun --ear-policy=monitoring application |
|
|
|
|
|
#### Intel MPI
|
|
#### Intel MPI
|
|
|
|
|
|
When running EAR with `mpirun` rather than `srun`, we have to specify the utilisation of `srun` as bootstrap. Otherwise jobs will not go through the SLURM plugin and any EAR options will not be recognised. The API depends on Intel version. Versions older or equal than 2018 use two mpirun arguments to specify the bootstrap and extra SLURM flags (to be passed to the SLURM).
|
|
When running EAR with `mpirun` rather than `srun`, we have to specify the utilisation of `srun` as bootstrap. Otherwise jobs will not go through the SLURM plugin and any EAR options will not be recognised. The API depends on Intel version. Versions prior to 2018 use two `mpirun` arguments to specify the bootstrap and extra SLURM flags (to be passed to SLURM).
|
|
|
|
|
|
The following example will run application with min_time_to_solution policy:
|
|
The following example will run application with min_time_to_solution policy:
|
|
|
|
|
... | @@ -90,7 +100,7 @@ Bootstrap is an Intel® MPI option but not an OpenMPI option. For OpenMPI `srun` |
... | @@ -90,7 +100,7 @@ Bootstrap is an Intel® MPI option but not an OpenMPI option. For OpenMPI `srun` |
|
|
|
|
|
ERUN
|
|
ERUN
|
|
----
|
|
----
|
|
ERUN is a program that simulates all the SLURM and EAR SLURM Plugin pipeline. It comes with the EAR package and is compiled automatically. You can find it in in `bin` folder in your installation path. It must be used when a set of nodes does not have SLURM installed or when using OpenMPI `mpirun` which does not contact with SLURM. You can launch ERUN instead directly your application. In example:
|
|
ERUN is a program that simulates all the SLURM and EAR SLURM Plugin pipeline. It comes with the EAR package and is compiled automatically. You can find it in in `bin` folder in your installation path. It must be used when a set of nodes does not have SLURM installed or when using OpenMPI `mpirun` which does not contact with SLURM. You can launch ERUN instead of directly run your application:
|
|
|
|
|
|
```
|
|
```
|
|
mpirun -n 4 /path/to/erun --program="hostname --alias"
|
|
mpirun -n 4 /path/to/erun --program="hostname --alias"
|
... | @@ -114,7 +124,7 @@ SLURM options: |
... | @@ -114,7 +124,7 @@ SLURM options: |
|
...
|
|
...
|
|
```
|
|
```
|
|
|
|
|
|
The `--job-id` and `--nodes` parameters, creates the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The `--clean` option removes the temporal files created to synchronize all ERUN processes.
|
|
The `--job-id` and `--nodes` parameters create the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The `--clean` option removes the temporal files created to synchronize all ERUN processes.
|
|
|
|
|
|
Also you have to load the EAR environment module or define its environment variables in your environment or script:
|
|
Also you have to load the EAR environment module or define its environment variables in your environment or script:
|
|
|
|
|
... | @@ -133,12 +143,8 @@ mpirun -n 4 /path/to/erun --program="myapp" --ear-policy=monitoring --ear-verbos |
... | @@ -133,12 +143,8 @@ mpirun -n 4 /path/to/erun --program="myapp" --ear-policy=monitoring --ear-verbos |
|
|
|
|
|
User commands
|
|
User commands
|
|
-------------
|
|
-------------
|
|
The only command available to users is `eacct`. With `eacct` a user can see their previously executed jobs with the information that EAR monitors (time, average power, number of nodes and average frequency among others) and a number of options to manipulate said output. Some data will not be available if a job is not executed with EARL.
|
|
The only command available to users is `eacct`. With `eacct` a user can see their previously executed jobs with the information that EAR monitors (time, average power, number of nodes and average frequency among others) and also can use several options to manipulate said output. Some data will not be available if a job is not executed with EARL.
|
|
|
|
|
|
Note that a user can only see their own applications/jobs unless they are a privileged user and specified as such in the `ear.conf` configuration file.
|
|
Note that a user can only see their own applications/jobs unless they are a privileged user and specified as such in the `ear.conf` configuration file.
|
|
|
|
|
|
For more information, check its [Commands section](Commands#energy-account-eacct). |
|
For more information, check its [Commands section](Commands#energy-account-eacct). |
|
|
|
\ No newline at end of file |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|