|
[[_TOC_]]
|
|
EAR was first designed to be usable 100% transparently by users, which means that
|
|
|
|
you can run your applications enabling/disabling/tuning EAR with the less effort
|
|
# Running jobs with EAR
|
|
for changing your workflow, e.g., submission scripts.
|
|
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
With EAR's SLURM plugin, running an application with EAR is as easy as submitting
|
|
schedulers, which do all the effort to set-up EAR on job submission.
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library is automatically
|
|
By now, SLURM is the batch scheduler 100% compatible by EAR thanks to EAR's SLURM
|
|
|
|
SPANK plug-in.
|
|
|
|
|
|
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting
|
|
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library (EARL) is automatically
|
|
loaded with some applications when EAR is enabled by default.
|
|
loaded with some applications when EAR is enabled by default.
|
|
|
|
|
|
You can type `ear-info` to see whether EAR is turned on by default.
|
|
Check with the `ear-info` command if EARL is `on`/`off` by default.
|
|
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plug-in to enable it.
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
For other schedulers, a simple prolog/epilog command can be created to provide
|
|
transparent job submission with EAR and default configuration.
|
|
transparent job submission with EAR and default configuration.
|
|
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
# Use cases
|
|
# Use cases
|
|
|
|
|
|
## MPI applications
|
|
## MPI applications
|
|
|
|
|
|
EAR Library is automatically loaded with MPI applications when EAR is enabled by
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
|
default (check `ear-info`). EAR supports the utilization of both
|
|
default (check `ear-info`). EAR supports the utilization of both
|
|
`mpirun`/`mpiexec` and `srun` commands.
|
|
`mpirun`/`mpiexec` and `srun` commands.
|
|
|
|
|
|
When using `sbacth`/`srun` or `salloc`, Intel MPI and OpenMPI are fully supported.
|
|
When using `sbacth`/`srun` or `salloc`, [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.mufipm)
|
|
When using specific MPI flavour commands to start applications (e.g., `mpirun`, `mpiexec.hydra`),
|
|
and [OpenMPI](https://www.open-mpi.org/) are fully supported.
|
|
there are some keypoints which you must take account. See next sections for examples and more
|
|
When using specific MPI flavour commands to start applications (e.g., `mpirun`,
|
|
details.
|
|
`mpiexec.hydra`), there are some keypoints which you must take account.
|
|
|
|
See [next sections](#using-mpirunmpiexec-command) for examples and more details.
|
|
|
|
|
|
## Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
|
|
|
|
EAR Library automatically supports this use case. Check with the `ear-info` command
|
|
EARL automatically supports this use case.
|
|
if EAR library is `on`/`off` by default. If it’s `off`, use `--ear=on` option
|
|
`mpirun`/`mpiexec` and `srun` are supported in the same manner as explained above.
|
|
offered by EAR SLURM plugin to enable it. `mpirun`/`mpiexec` and `srun` are supported
|
|
|
|
in the same manner as explained above.
|
|
|
|
|
|
|
|
## Python (not MPI)
|
|
### Python MPI applications
|
|
|
|
|
|
EAR version 4.1 automatically executes the EAR Library with Python applications,
|
|
EARL cannot detect automatically MPI symbols when Python is used.
|
|
so no action is needed. Check with the `ear-info` command
|
|
On that case, an environment variable used to specify which MPI flavour is provided.
|
|
if EAR library is `on`/`off` by default. If it’s `off`, use `--ear=on` option
|
|
|
|
offered by EAR SLURM plugin to enable it.
|
|
|
|
|
|
|
|
## Python + MPI applications
|
|
Export [`SLURM_EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_version) environment variable with either _intel_ or _open mpi_
|
|
|
|
|
|
EAR Library cannot detect automatically MPI symbols when Python is used. On that case,
|
|
|
|
an environment variable used to specify which MPI flavour is provided. Export
|
|
|
|
`SLURM_EAR_LOAD_MPI_VERSION` environment variable with either _intel_ or _open mpi_
|
|
|
|
values, e.g., `export SLURM_EAR_LOAD_MPI_VERSION="open mpi"`, whose are the two MPI
|
|
values, e.g., `export SLURM_EAR_LOAD_MPI_VERSION="open mpi"`, whose are the two MPI
|
|
implementations 100% supported by EAR.
|
|
implementations 100% supported by EAR.
|
|
|
|
|
|
Check with the `ear-info` command if EAR library is `on`/`off` by default.
|
|
### Running MPI applications on SLURM systems
|
|
If it’s `off`, use `--ear=on` option offered by EAR SLURM plugin to enable it.
|
|
|
|
|
|
#### Using `srun` command
|
|
|
|
|
|
|
|
Running MPI applications with EARL on SLURM systems using `srun` command is the most
|
|
|
|
straightforward way to start using EAR.
|
|
|
|
All jobs are monitored by EAR and the Library is loaded by default depending on
|
|
|
|
the cluster configuration.
|
|
|
|
To run a job with `srun` and EARL **there is no need to load the EAR module**.
|
|
|
|
|
|
|
|
Even though it is automatic, there are few [flags](#ear-job-submission-flags) than can be selected
|
|
|
|
at job submission.
|
|
|
|
They are provided by EAR's SLURM SPANK plug-in.
|
|
|
|
When using SLURM commands for job submission, both Intel and OpenMPI implementations are
|
|
|
|
supported.
|
|
|
|
|
|
|
|
#### Using `mpirun`/`mpiexec` command
|
|
|
|
|
|
|
|
To provide an automatic loading of the EAR library, the only requirement from
|
|
|
|
the MPI library is to be coordinated with the scheduler.
|
|
|
|
|
|
|
|
##### Intel MPI
|
|
|
|
|
|
|
|
Recent versions of Intel MPI offers two environment variables that can be used
|
|
|
|
to guarantee the correct scheduler integrations:
|
|
|
|
|
|
|
|
- `I_MPI_HYDRA_BOOTSTRAP` sets the bootstrap server. It must be
|
|
|
|
set to *slurm*.
|
|
|
|
- `I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS` sets additional arguments for the bootstrap server.
|
|
|
|
These arguments are passed to SLURM, and they can be all the same as EAR's SPANK plug-in provides.
|
|
|
|
|
|
|
|
You can read [here](https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/hydra-environment-variables.html) the Intel environment variables guide.
|
|
|
|
|
|
|
|
##### OpenMPI
|
|
|
|
|
|
|
|
For joining OpenMPI and EAR it is highly recommended to use SLURM's `srun` command.
|
|
|
|
When using `mpirun`, as OpenMPI is not fully coordinated with the scheduler, EARL
|
|
|
|
is not automatilly loaded on all nodes.
|
|
|
|
Therefore EARL will be disabled and only [basic energy metrics](#jobs-executed-without-the-ear-library-basic-job-accounting)
|
|
|
|
will be reported.
|
|
|
|
To provide support for this workflow, EAR provides [`erun`](EAR-commands#erun) command.
|
|
|
|
Read the corresponding [examples section](User-guide.md#openmpi-1) for more information about how to use this command.
|
|
|
|
|
|
|
|
##### MPI4PY
|
|
|
|
|
|
|
|
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify
|
|
|
|
the application as Intel or OpenMPI. In order to specify it, the user has
|
|
|
|
to define the `SLURM_LOAD_MPI_VERSION` environment variable with the values _intel_ or
|
|
|
|
_open mpi_. It is recommended to add in Python modules to make it easy for
|
|
|
|
final users.
|
|
|
|
|
|
|
|
## Non-MPI applications
|
|
|
|
|
|
|
|
### Python
|
|
|
|
|
|
## OpenMP, CUDA, MK (non-MPI) applications
|
|
Since version 4.1 EAR automatically executes the Library with Python applications,
|
|
|
|
so no action is needed.
|
|
|
|
You must run the application with `srun` command to pass through the EAR's SLURM
|
|
|
|
SPANK plug-in in order to enable/disable/tuning EAR.
|
|
|
|
See [EAR submission flags](#ear-job-submission-flags) provided by EAR SLURM integration.
|
|
|
|
|
|
To load the EAR Library automatically with non MPI applications it is required to
|
|
### OpenMP, CUDA and Intel MKL
|
|
have it compiled with dynamic symbols and also it must be executed with `srun` command.
|
|
|
|
For example, for CUDA applications the `--cudart=shared` option must be used.
|
|
To load EARL automatically with non-MPI applications it is required to have it compiled
|
|
|
|
with dynamic symbols and also it must be executed with `srun` command.
|
|
|
|
For example, for CUDA applications the `--cudart=shared` option must be used at compile time.
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
|
|
|
|
## Other application types or frameworks
|
|
## Other application types or frameworks
|
|
|
|
|
|
For other programming models or sequential apps not supported by default, EARL can
|
|
For other programming models or sequential apps not supported by default, EARL can
|
|
be forced to be loaded by setting `SLURM_EAR_LOADER_APPLICATION` enviroment variable,
|
|
be forced to be loaded by setting [`SLURM_EAR_LOADER_APPLICATION`](EAR-environment-variables#ear_loader_application)
|
|
defined with the application name.
|
|
enviroment variable, which must be defined with the application name.
|
|
|
|
For example:
|
|
|
|
|
|
```
|
|
```
|
|
#!/bin/bash
|
|
#!/bin/bash
|
... | @@ -68,31 +127,98 @@ export SLURM_EAR_LOADER_APPLICATION=my_app |
... | @@ -68,31 +127,98 @@ export SLURM_EAR_LOADER_APPLICATION=my_app |
|
srun my_app
|
|
srun my_app
|
|
```
|
|
```
|
|
|
|
|
|
# MPI + srun
|
|
# Retrieving EAR data
|
|
|
|
|
|
Running MPI applications with EARL is automatic for SLURM systems when
|
|
As a job accounting and monitoring tool, EARL collects some metrics that you can
|
|
using `srun`. All the jobs are monitored by EAR and the Library is loaded by default
|
|
get to see or know your applications behaviour.
|
|
depending on the cluster configuration.
|
|
The Library is doted with several modules and options to be able to provide different
|
|
To run a job with srun and EARL there is no need to load the EAR module.
|
|
kind of information.
|
|
Even though it is automatic, there are few flags than can be selected at job submission.
|
|
|
|
When using slurm commands for job submission, both Intel and OpenMPI implementations are
|
|
As a very simple hint of your application workload, you can enable EARL verbosity
|
|
supported.
|
|
to get loop data at runtime.
|
|
|
|
The information is shown at _stderr_ by default.
|
|
## EAR job submission flags
|
|
Read how to set up verbosity at [submission time](ear-job-submission-flags) and
|
|
|
|
[verbosity environment variables](EAR-environment-variables#verbosity) provided
|
|
The following EAR options can be specified when running `srun` and/or `sbatch`, and are supported with `srun`/`sbatch`/`salloc`:
|
|
for a more advanced tunning of this EAR feature.
|
|
|
|
|
|
| Options | Description |
|
|
To get offline job data EAR provides [`eacct`](EAR-commands#ear-job-accounting-eacct),
|
|
| -------------------------- | -------------------------------------------------------------------- |
|
|
a tool to provide the monitored job data stored in the Database.
|
|
| \-\-ear=\[on\|off\] | Enables/disables EAR library loading with this job. |
|
|
You can request information in different ways, so you can read aggregated job data,
|
|
| \-\-ear-user-db=_\<filename\>_ | Asks the EAR Library to generate a set of CSV files with EARL metrics. One file per node is generated with the average node metrics (node signature) and one file with multiple lines per node is generated with runtime collected metrics (loops node signatures). |
|
|
per-node or per-loop information among other things.
|
|
| \-\-ear-verbose=\[0\|1\] | Specifies the level of verbosity; the default is 0. Verbose messages are placed by default in _stderr_. For jobs with multiple nodes, this option can result in lots of messages mixed at _stderr_. You can set `SLURM_EARL_VERBOSE_PATH` environment variable and one file per node will be generated with EAR output. The environemnt variable must be set with the path (a directory) where you want the output files to be generated, it will be automatically created if needed. |
|
|
See [eacct usage examples](User-guide#ear-job-accounting-eacct) for a better overview of which kind of data `eacct` provides.
|
|
|
|
|
|
For more information consult `srun --help` output or see configuration options sections for more detailed description.
|
|
There is another way to get runtime and aggregated data during runtime without the
|
|
|
|
need of calling `eacct` after the job completion.
|
|
|
|
EAR implements a reporting system mechanism which let developers to add new report
|
|
|
|
plug-ins, so there is an infinit set of ways to report EAR collected data.
|
|
|
|
|
|
|
|
Therefore EAR releases come with a fully supported report plug-in (called *csv_ts*)
|
|
|
|
which basically provides the same runtime and aggregated data reported to the Database in CSV files,
|
|
|
|
directly while the job is running.
|
|
|
|
You can load this plug-in in two ways:
|
|
|
|
|
|
|
|
1. By setting [`--ear-user-db`](#ear-job-submission-flags) flag at submission time.
|
|
|
|
2. [Loading directly the report plug-in](EAR-environment-variables#ear_report_add) through an environment variable:
|
|
|
|
`export SLURM_EAR_REPORT_ADD=csv_ts.so`.
|
|
|
|
|
|
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) for more information
|
|
|
|
about report plug-ins.
|
|
|
|
|
|
|
|
You can also request EAR to report **events** to the [Database](EAR-Databse).
|
|
|
|
They show more details about EARL internal state and can be provided with `eacct`
|
|
|
|
command.
|
|
|
|
See how to enable [EAR events reporting](EAR-environment-variables#report_earl_events)
|
|
|
|
and which kind of events EAR is reporting.
|
|
|
|
|
|
|
|
If your application applies, you can request EAR to report at the
|
|
|
|
end of the execution a [summary about its MPI behaviour](EAR-environment-variables#ear_get_mpi_stats).
|
|
|
|
The information is provided along two files and is the aggregated data of each process of the application.
|
|
|
|
|
|
|
|
Finally, EARL can provide runtime data in the [Paraver](https://tools.bsc.es/paraver) trace format.
|
|
|
|
Paraver is a flexible performance analysis tool maintained by the [*Barcelona Supercomputing Center*](https://bsc.es/)'s tools team.
|
|
|
|
This tool provides an easy way to visualize runtime data, computing derived metrics
|
|
|
|
and to provide histograms for better of your application behaviour.
|
|
|
|
See on the [environment variables page](EAR-environment-variables#ear_trace_plugin)
|
|
|
|
how to generate Paraver traces.
|
|
|
|
|
|
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want to get more details
|
|
|
|
about how to deal with EAR data with Paraver.
|
|
|
|
|
|
|
|
# EAR job submission flags
|
|
|
|
|
|
|
|
The following EAR options can be specified when running `srun` and/or `sbatch`,
|
|
|
|
and are supported with `srun`/`sbatch`/`salloc`:
|
|
|
|
|
|
|
|
| Option | Description |
|
|
|
|
| ---------------------------------- | ---------------------------------------------------------------------- |
|
|
|
|
| **\-\-ear**=\[on\|off\] | Enables/disables EAR library loading with this job. |
|
|
|
|
| **\-\-ear-user-db**=_\<filename\>_ | Asks the EAR Library to generate a set of CSV files with EARL metrics. |
|
|
|
|
| **\-\-ear-verbose**=\[0\|1\] | Specifies the level of verbosity; the default is 0. |
|
|
|
|
|
|
|
|
When using `ear-user-db` flag, one file per node is generated with the average
|
|
|
|
node metrics (node signature) and one file with multiple lines per node is generated
|
|
|
|
with runtime collected metrics (loops node signatures).
|
|
|
|
Read [`eacct`'s section](EAR-commands#ear-job-accounting-eacct) in the commands page to know which metrics are reported,
|
|
|
|
as data generated by this flag is the same as the reported (and retrieved later by the command) to the Database.
|
|
|
|
|
|
|
|
Verbose messages are placed by default in _stderr_.
|
|
|
|
For jobs with multiple nodes, `ear-verbose` option can result in lots of messages
|
|
|
|
mixed at _stderr_.
|
|
|
|
We recommend to split up SLURM's output (or error) file per-node.
|
|
|
|
You can read SLURM's [filename pattern specification](https://slurm.schedmd.com/sbatch.html#lbAH) for more information.
|
|
|
|
|
|
|
|
If you still need to have job output and EAR output separated, you can set
|
|
|
|
[`SLURM_EARL_VERBOSE_PATH`](EAR-environment-variables#earl_verbose_path) environment
|
|
|
|
variable and one file per node will be generated only with EAR output.
|
|
|
|
The environemnt variable must be set with the path (a directory) where you want
|
|
|
|
the output files to be generated, it will be automatically created if needed.
|
|
|
|
|
|
|
|
> You can always check the avaiable EAR submission flags provided by EAR's SLURM SPANK
|
|
|
|
plug-in by typing `srun --help`.
|
|
|
|
|
|
## CPU frequency selection
|
|
## CPU frequency selection
|
|
|
|
|
|
The EAR configuration files supports the specification of *EAR authorized users*,
|
|
The [EAR configuration file](www.example.org) supports the specification of *EAR authorized users*,
|
|
who can ask for a more privileged submission options. The most relevant ones are the possibility
|
|
who can ask for a more privileged submission options. The most relevant ones are the possibility
|
|
to ask for a specific optimisation policy and a specific CPU frequency. Contact
|
|
to ask for a specific optimisation policy and a specific CPU frequency. Contact
|
|
with sysadmin or helpdesk team to become an authorized user.
|
|
with sysadmin or helpdesk team to become an authorized user.
|
... | @@ -115,61 +241,24 @@ To see the list of available frequencies of the GPU you will work on, you can ty |
... | @@ -115,61 +241,24 @@ To see the list of available frequencies of the GPU you will work on, you can ty |
|
nvidia-smi -q -d SUPPORTED_CLOCKS
|
|
nvidia-smi -q -d SUPPORTED_CLOCKS
|
|
```
|
|
```
|
|
|
|
|
|
# MPI + mpirun
|
|
|
|
|
|
|
|
To provide an automatic loading of the EAR library, the only requirement from
|
|
|
|
the MPI library is to be coordinated with the scheduler.
|
|
|
|
|
|
|
|
## Intel MPI
|
|
|
|
|
|
|
|
Recent versions of Intel MPI offers two environment variables that can be used
|
|
|
|
to guarantee the correct scheduler integrations:
|
|
|
|
|
|
|
|
- `I_MPI_HYDRA_BOOTSTRAP` sets the bootstrap server. It must be
|
|
|
|
set to slurm.
|
|
|
|
- `I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS` sets additional arguments for the bootstrap server. These arguments are passed
|
|
|
|
to slurm.
|
|
|
|
|
|
|
|
You can read [here](https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/hydra-environment-variables.html) the Intel environment variables guide.
|
|
|
|
|
|
|
|
## OpenMPI
|
|
|
|
|
|
|
|
For OpenMPI and EAR it is highly recommened to use SLURM. When
|
|
|
|
using `mpirun`, as OpenMPI is not fully coordinated with the scheduler,
|
|
|
|
the EAR Library is not automatilly loaded on all the nodes. If `mpirun` is
|
|
|
|
used, tEARL will be disabled and only basic energy metrics will
|
|
|
|
be reported.
|
|
|
|
|
|
|
|
## MPI4PY
|
|
|
|
|
|
|
|
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify
|
|
|
|
the application as Intel or OpenMPI. In order to specify it, the user has
|
|
|
|
to define the `SLURM_LOAD_MPI_VERSION` environment variable with the values _intel_ or
|
|
|
|
_open mpi_. It is recommended to add in Python modules to make it easy for
|
|
|
|
final users.
|
|
|
|
|
|
|
|
## Using additional MPI profiling libraries/tools
|
|
|
|
|
|
|
|
EAR uses the `LD_PRELOAD` mechanism to be loaded and the PMPI API for
|
|
|
|
a transparent loading. In order to be compatible with other profiling libraries
|
|
|
|
EAR is not replacing the MPI symbols, it just calls the next symbol in the list.
|
|
|
|
So it is compatible with other tools or profiling libraries. In case of conflict, the
|
|
|
|
EARL can be disabled by setting `--ear=off` flag at submission time.
|
|
|
|
|
|
|
|
# Examples
|
|
# Examples
|
|
|
|
|
|
### `srun` examples
|
|
## `srun` examples
|
|
|
|
|
|
Having an MPI application asking for one node and 24 tasks, the following is a
|
|
Having an MPI application asking for one node and 24 tasks, the following is a
|
|
simple case of job submission. If EAR library is turned on by default, no extra options
|
|
simple case of job submission.
|
|
are needed to load it. To check if it is on by default, load the EAR module and
|
|
If EARL is turned on by default, no extra options
|
|
execute the `ear-info` command. EAR verbose is set to 0 by default (no messages).
|
|
are needed to load it.
|
|
|
|
To check if it is on by default, load the EAR module and
|
|
|
|
execute the `ear-info` command.
|
|
|
|
EAR verbose is set to 0 by default, i.e., no EAR messages.
|
|
|
|
|
|
```
|
|
```
|
|
srun -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
|
|
|
|
|
The following executes the application showing EAR messages, including EAR configuration and node signature in _stderr_.
|
|
The following executes the application showing EAR messages, including EAR configuration
|
|
|
|
and node signature in _stderr_.
|
|
|
|
|
|
```
|
|
```
|
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
... | @@ -182,7 +271,7 @@ export SLURM_EARL_VERBOSE_PATH=logs |
... | @@ -182,7 +271,7 @@ export SLURM_EARL_VERBOSE_PATH=logs |
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
|
|
```
|
|
```
|
|
|
|
|
|
The following asks for EAR library metrics to be stored in csv file after
|
|
The following asks for EARL metrics to be stored in csv file after
|
|
the application execution. Two files per node will be generated: one with the average/global signature and another with loop signatures. The format of output files is _\<filename\>.\<nodename\>_.time.csv
|
|
the application execution. Two files per node will be generated: one with the average/global signature and another with loop signatures. The format of output files is _\<filename\>.\<nodename\>_.time.csv
|
|
for the global signature and _\<filename\>.\<nodename\>_.time.loops.csv for loop signatures.
|
|
for the global signature and _\<filename\>.\<nodename\>_.time.loops.csv for loop signatures.
|
|
|
|
|
... | @@ -228,7 +317,7 @@ mkdir ear_metrics |
... | @@ -228,7 +317,7 @@ mkdir ear_metrics |
|
srun --ear-user-db=ear_metrics/app_metrics application
|
|
srun --ear-user-db=ear_metrics/app_metrics application
|
|
```
|
|
```
|
|
|
|
|
|
## EARL + mpirun
|
|
## EARL + `mpirun`
|
|
|
|
|
|
### Intel MPI
|
|
### Intel MPI
|
|
|
|
|
... | @@ -241,148 +330,108 @@ mpiexec.hydra -n 10 application |
... | @@ -241,148 +330,108 @@ mpiexec.hydra -n 10 application |
|
|
|
|
|
### OpenMPI
|
|
### OpenMPI
|
|
|
|
|
|
Bootstrap is an Intel® MPI option but not an OpenMPI option. For OpenMPI
|
|
Bootstrap is an Intel(R) MPI option but not an OpenMPI option. For OpenMPI
|
|
`srun` must be used for an automatic EAR support. In case OpenMPI with
|
|
`srun` must be used for an automatic EAR support.
|
|
`mpirun` is needed, EAR offers the `erun` comman explained below.
|
|
In case OpenMPI with `mpirun` is needed, EAR offers the `erun` command, which
|
|
|
|
is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline.
|
|
## erun
|
|
You can launch `erun` with the `--program` option to specify the application name
|
|
|
|
|
|
*erun* is a program that simulates all the SLURM and EAR SLURM Plugin pipeline.
|
|
|
|
You can launch erun with the `--program` option to specify the application name
|
|
|
|
and arguments.
|
|
and arguments.
|
|
|
|
|
|
```
|
|
```
|
|
mpirun -n 4 /path/to/erun --program="hostname --alias"
|
|
mpirun -n 4 /path/to/erun --program="hostname --alias"
|
|
```
|
|
```
|
|
|
|
|
|
In this example, `mpirun` would run 4 `erun` processes. Then, `erun` would launch the application `hostname` with its alias parameter. You can use as many parameters as you want but the semicolons have to cover all the parameters in case there are more than just the program name. `erun` would simulate on the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.
|
|
In this example, `mpirun` would run 4 erun processes.
|
|
|
|
Then, `erun` will launch the application `hostname` with its alias parameter.
|
|
|
|
You can use as many parameters as you want but the semicolons have to cover all
|
|
|
|
of them in case there are more than just the program name.
|
|
|
|
|
|
|
|
`erun` will simulate on the remote node both the local and remote pipelines for
|
|
|
|
all created processes.
|
|
|
|
It has an internal system to avoid repeating functions that are executed just one
|
|
|
|
time per job or node, like SLURM does with its plugins.
|
|
|
|
|
|
|
|
**IMPORTANT NOTE** If you are going to launch `n` applications with `erun` command through a sbatch job, you must set the environment variable `SLURM_STEP_ID` to values from `0` to `n-1` before each `mpirun` call.
|
|
|
|
By this way `erun` will inform the EARD the correct step ID to be stored then to the Database.
|
|
|
|
|
|
|
|
# EAR job Accounting (`eacct`)
|
|
|
|
|
|
|
|
The [`eacct`](EAR-commands#ear-job-accounting-eacct) command shows accounting information stored in the EAR DB for
|
|
|
|
jobs (and steps) IDs.
|
|
|
|
The command uses EAR's configuration file to determine if the user running it is
|
|
|
|
privileged or not, as **non-privileged users can only access their information**.
|
|
|
|
It provides the following options.
|
|
|
|
|
|
|
|
## Usage examples
|
|
|
|
|
|
|
|
The basic usage of `eacct` retrieves the last 20 applications (by default) of the
|
|
|
|
user executing it.
|
|
|
|
If a user is **privileged**, they may see all users applications.
|
|
|
|
The default behaviour shows data from each job-step, aggregating the values from
|
|
|
|
each node in said job-step.
|
|
|
|
If using SLURM as a job manager, a *sb* (sbatch) job-step is created with the data
|
|
|
|
from the entire execution.
|
|
|
|
A specific job may be specified with `-j` option.
|
|
|
|
|
|
```
|
|
```
|
|
> erun --help
|
|
[user@host EAR]$ eacct -j 175966
|
|
|
|
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
|
|
This is the list of ERUN parameters:
|
|
175966-sb user afid NP 2 2.97/3.00/--- 3660.00 381.51 --- --- 2792619 --- --- --- --- --- ---
|
|
Usage: ./erun [OPTIONS]
|
|
175966-2 user afid MO 2 2.97/3.00/2.39 1205.26 413.02 146.21 1.04 995590 0.1164 0.0 21.0 --- --- ---
|
|
|
|
175966-1 user afid MT 2 2.62/2.60/2.37 1234.41 369.90 142.63 1.02 913221 0.1265 0.0 19.7 --- --- ---
|
|
Options:
|
|
175966-0 user afid ME 2 2.71/3.00/2.19 1203.33 364.60 146.23 1.07 877479 0.1310 0.0 17.9 --- --- ---
|
|
--job-id=<arg> Set the JOB_ID.
|
|
|
|
--nodes=<arg> Sets the number of nodes.
|
|
|
|
--program=<arg> Sets the program to run.
|
|
|
|
--clean Removes the internal files.
|
|
|
|
|
|
|
|
SLURM options:
|
|
|
|
...
|
|
|
|
```
|
|
```
|
|
|
|
|
|
The `--job-id` and `--nodes` parameters create the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The `--clean` option removes the temporal files created to synchronize all ERUN processes.
|
|
The command shows a pre-selected set of columns, read `eacct`'s section on the [EAR commands page](EAR-commands).
|
|
|
|
|
|
Also you have to load the EAR environment module or define its environment variables in your environment or script:
|
|
|
|
|
|
|
|
| Variable | Parameter |
|
|
For node-specific information, the `-l` (i.e., long) option provides detailed accounting of each individual node:
|
|
| ------------------------- | ---------------------- |
|
|
In addition, `eacct` shows an additional column: `VPI(%)` (See the example below).
|
|
| EAR_INSTALL_PATH=\<path\> | prefix=\<path\> |
|
|
The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
|
|
| EAR_TMP=\<path\> | localstatedir=\<path\> |
|
|
|
|
| EAR_ETC=\<path\> | sysconfdir=\<path\> |
|
|
|
|
| EAR_DEFAULT=\<on/off\> | default=<on/off\> |
|
|
|
|
|
|
|
|
> NOTE If you are going to launch `n` applications with `erun` command through a sbatch job, you must set the environment variable `SLURM_STEP_ID` to values from `0` to `n-1` before each `mpirun` call. By this way `erun` will inform the EARD the correct step ID to be stored then to the DataBase.
|
|
```
|
|
|
|
[user@host EAR]$ eacct -j 175966 -l
|
|
|
|
JOB-STEP NODE ID USER ID APPLICATION AVG-F/IMC-F TIME(s) POWER(s) GBS CPI ENERGY(J) IO(MBS) MPI% VPI(%) G-POW(T/U) G-FREQ G-UTIL(G/M)
|
|
|
|
175966-sb cmp2506 user afid 2.97/--- 3660.00 388.79 --- --- 1422970 --- --- --- --- --- ---
|
|
|
|
175966-sb cmp2507 user afid 2.97/--- 3660.00 374.22 --- --- 1369649 --- --- --- --- --- ---
|
|
|
|
175966-2 cmp2506 user afid 2.97/2.39 1205.27 423.81 146.06 1.03 510807 0.0 21.2 0.23 --- --- ---
|
|
|
|
175966-2 cmp2507 user afid 2.97/2.39 1205.26 402.22 146.35 1.05 484783 0.0 20.7 0.01 --- --- ---
|
|
|
|
175966-1 cmp2506 user afid 2.58/2.38 1234.46 374.14 142.51 1.02 461859 0.0 19.4 0.00 --- --- ---
|
|
|
|
175966-1 cmp2507 user afid 2.67/2.37 1234.35 365.67 142.75 1.03 451362 0.0 20.0 0.01 --- --- ---
|
|
|
|
175966-0 cmp2506 user afid 2.71/2.19 1203.32 371.76 146.25 1.08 447351 0.0 17.9 0.01 --- --- ---
|
|
|
|
175966-0 cmp2507 user afid 2.71/2.19 1203.35 357.44 146.21 1.05 430128 0.0 17.9 0.01 --- --- ---
|
|
|
|
```
|
|
|
|
|
|
# Job accounting (eacct)
|
|
If EARL was loaded during an application execution, runtime data (i.e., EAR loops) may be retrieved by using `-r` flag.
|
|
|
|
You can still filter the output by Job (and Step) ID.
|
|
|
|
|
|
The eacct command shows accounting information stored in the EAR DB for
|
|
Finally, to easily transfer `eacct`’s output, `-c` option saves the requested data in CSV format.
|
|
jobs (and step) IDs. The command uses EAR’s configuration file to determine
|
|
Both aggregated and detailed accountings are available, as well as filtering.
|
|
if the user running it is privileged or not, as non-privileged users can only access
|
|
When using along with `-l` or `-r` options, all metrics stored in the EAR Database are given.
|
|
their information. It provides the following options. The ear module needs to
|
|
Please, read the [commands section page](EAR-commands) to see which of them are available.
|
|
be loaded to use the eacct command.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
[user@host EAR]$ eacct -j 175966.1 -r
|
|
|
|
JOB-STEP NODE ID ITER. POWER(W) GBS CPI GFLOPS/W TIME(s) AVG_F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
|
|
|
|
175966-1 cmp2506 21 360.6 115.8 0.838 0.086 1.001 2.58 2.30 0.0 11.6 0.0 / 0.0 0.00 0%/0%
|
|
|
|
175966-1 cmp2507 21 333.7 118.4 0.849 0.081 1.001 2.58 2.32 0.0 12.0 0.0 / 0.0 0.00 0%/0%
|
|
|
|
175966-1 cmp2506 31 388.6 142.3 1.010 0.121 1.113 2.58 2.38 0.0 19.7 0.0 / 0.0 0.00 0%/0%
|
|
|
|
175966-1 cmp2507 31 362.8 142.8 1.035 0.130 1.113 2.59 2.37 0.0 19.5 0.0 / 0.0 0.00 0%/0%
|
|
|
|
175966-1 cmp2506 41 383.3 143.2 1.034 0.124 1.114 2.58 2.38 0.0 19.6 0.0 / 0.0 0.00 0%/0%
|
|
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
Usage: eacct [Optional parameters]
|
|
[user@host EAR]$ eacct -j 175966 -c test.csv
|
|
Optional parameters:
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
-h displays this message
|
|
|
|
-v displays current EAR version
|
|
[user@host EAR]$ eacct -j 175966.1 -c -l test.csv
|
|
-b verbose mode for debugging purposes
|
|
Successfully written applications to csv. Only applications with EARL will have its information properly written.
|
|
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
|
|
|
|
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
|
|
|
|
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
|
|
|
|
-a specifies the application names that will be retrieved. [default: all app_ids]
|
|
|
|
-c specifies the file where the output will be stored in CSV format. [default: no file]
|
|
|
|
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
|
|
|
|
-l shows the information for each node for each job instead of the global statistics for said job.
|
|
|
|
-x shows the last EAR events. Nodes, job ids, and step ids can be specified as if were showing job information.
|
|
|
|
-m prints power signatures regardless of whether mpi signatures are available or not.
|
|
|
|
-r shows the EAR loop signatures. Nodes, job ids, and step ids can be specified as if were showing job information.
|
|
|
|
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
|
|
|
|
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
|
|
|
|
```
|
|
```
|
|
|
|
|
|
## eacct usage examples
|
|
# Job energy optimization: EARL policies
|
|
|
|
|
|
The basic usage of `eacct` retrieves the last 20 applications (by default) of the
|
|
The core component of EAR at the user's job level is the EAR Library (EARL).
|
|
user executing it. If a user is privileged, it may see all users applications. The
|
|
The Library deals with job monitoring and is the component which implements and applies
|
|
default behaviour shows data from each job-step, aggregating the values from
|
|
optimization policies based on monitored workload.
|
|
each node in said job-step. If using SLURM as a job manager, a sb (sbatch)
|
|
|
|
job-step is created with the data from the entire execution. A specific job may
|
|
We highly recommend you to read [EARL](EARL) documentation and also how energy policies work
|
|
be specified with `-j` option:
|
|
in order to better understand what is doing the Library internally, so you will can explore easily all features (e.g., tunning variables, collecting data) EAR offers to the end-user so you will have more knowledge about how much resources your application consumes and how to correlate with its computational characteristics. |
|
- `[user@host EAR]$ eacct` -> Shows last 20 jobs (maximum) executed by the user.
|
|
\ No newline at end of file |
|
- `[user@host EAR]$ eacct -j 175966` –> Shows data for jobid = 175966. Metrics are averaged per job.stepid.
|
|
|
|
- `[user@host EAR]$ eacct -j 175966.0` –> Shows data for jobid = 175966 stepid=0. Metrics are averaged per job.stepid.
|
|
|
|
- `[user@host EAR]$ eacct -j 175966,175967,175968` –> Shows data for jobid= 175966, 175967, 175968 Metrics are averaged per job.stepid.
|
|
|
|
|
|
|
|
*eacct* shows a pre-selected set of columns. Some flags sligthly modifies the set
|
|
|
|
of columns reported:
|
|
|
|
- JOB-STEP: JobID and Step ID. sb is shown for the sbatch.
|
|
|
|
- USER: Username who executed the job.
|
|
|
|
- APP=APPLICATION: Job’s name or executable name if job name is not provided.
|
|
|
|
- POLICY: Energy optimization policy name (MO = Monitoring).
|
|
|
|
- NODES: Number of nodes which ran the job.
|
|
|
|
- AVG/DEF/IMC(GHz): Average CPU frequency, default frequency and average uncore frequency. Includes all the nodes for the step. In KHz.
|
|
|
|
- TIME(s): Step execution time, in seconds.
|
|
|
|
- POWER: Average node power including all the nodes, in Watts.
|
|
|
|
- GBS: CPU Main memory bandwidth (GB/second). Hint for CPU/Memory bound classification.
|
|
|
|
- CPI: CPU Cycles per Instruction. Hint for CPU/Memory bound classification.
|
|
|
|
- ENERGY(J): Accumulated node energy. Includes all the nodes. In Joules.
|
|
|
|
- GFLOPS/WATT : CPU GFlops per Watt. Hint for energy efficiency.
|
|
|
|
- IO(MBs) : IO (read and write) Mega Bytes per second.
|
|
|
|
- MPI% : Percentage of MPI time over the total execution time. It’s the average including all the processes and nodes.
|
|
|
|
- GPU metrics
|
|
|
|
- G-POW (T/U) : Average GPU power. Accumulated per node and average of all the nodes.
|
|
|
|
- T = Total (GPU power consumed even if the process is not using them).
|
|
|
|
- U = GPUs used by the job.
|
|
|
|
- G-FREQ : Average GPU frequency. Per node and average of all the nodes.
|
|
|
|
- G-UTIL(G/MEM) : GPU utilization and GPU memory utilization.
|
|
|
|
|
|
|
|
For node-specific information, the `-l` option provides detailed accounting of each
|
|
|
|
individual node:
|
|
|
|
- `[user@host EAR]$ eacct -j 175966 -l` –> Shows per-node data for jobid=175966.
|
|
|
|
- `[user@host EAR]$ eacct -j 175966.0 -l` –> Shows per-node data for jobid=175966, stepid=0.
|
|
|
|
|
|
|
|
One additional column is shown: the VPI. The VPI is the percentage of AVX512 instructions over the total number of instructions.
|
|
|
|
|
|
|
|
For runtime data (EAR loops) one may retrieve them with `-r`. Both Job Id and Step Id filtering works:
|
|
|
|
- `[user@host EAR]$ eacct -j 175966.1 -r` –> shows metrics reported at runtime by the EAR library for jobid=175966 , stepid=1.
|
|
|
|
|
|
|
|
To easily transfer eacct’s output, `-c` option saves it in .csv format. Both aggregated and detailed accountings are available, as well as filtering:
|
|
|
|
- `[user@host EAR]$ eacct -j 175966 -c test.csv` –> adds to file test.csv all the metrics in EAR DB for jobid=175966. Metrics are averaged per application.
|
|
|
|
- `[user@host EAR]$ eacct -j 175966.1 -c -l test.csv` –> adds to file test.csv all the metrics in EAR DB for jobid=175966, stepid= 1. Metrics are per-node.
|
|
|
|
- `[user@host EAR]$ eacct -j 175966.1 -c -r test.csv` –> adds to file test.csv all the metrics in EAR DB for jobid=175966, stepid= 1. Metrics are per loop and node.
|
|
|
|
|
|
|
|
When using the `-c` option, all the metrics available in the EAR DB are reported.
|
|
|
|
|
|
|
|
# Jobs executed without the EAR library: Basic Job accounting
|
|
|
|
|
|
|
|
EAR library is automatically loaded with some programming models (MPI,
|
|
|
|
MKL, OpenMP and CUDA). For applications not executed with the EARL loaded
|
|
|
|
-for example, when srun is not used or programming models or applications not loaded by default by the EAR library-
|
|
|
|
EAR provides a default monitoring. In this case a subset of metrics will be reported. In particular:
|
|
|
|
- accumulated DC energy(J)
|
|
|
|
- accumulated DRAM energy(J)
|
|
|
|
- accumulated CPU PCK energy(J)
|
|
|
|
- EDP
|
|
|
|
- maximum DC power detected(W)
|
|
|
|
- minimum DC power detected(W)
|
|
|
|
- execution time (in sec)
|
|
|
|
- CPU average frequency (kHz)
|
|
|
|
- CPU default frequency(KHz).
|
|
|
|
|
|
|
|
DC node energy includes the CPU and GPU energy if there are.
|
|
|
|
These metrics are reported per node and jobid and stepid, so they can be seen per job and job and step when using eacct. |
|
|
|
\ No newline at end of file |
|
|