... | @@ -5,8 +5,7 @@ you can run your applications enabling/disabling/tuning EAR with the less effort |
... | @@ -5,8 +5,7 @@ you can run your applications enabling/disabling/tuning EAR with the less effort |
|
for changing your workflow, e.g., submission scripts.
|
|
for changing your workflow, e.g., submission scripts.
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch
|
|
schedulers, which do all the effort to set-up EAR at job submission.
|
|
schedulers, which do all the effort to set-up EAR at job submission.
|
|
By now, **SLURM is the batch scheduler full compatible with EAR** thanks to EAR's SLURM
|
|
By now, **[SLURM](https://slurm.schedmd.com/documentation.html) is the batch scheduler full compatible with EAR** thanks to EAR's SLURM SPANK plug-in.
|
|
SPANK plug-in.
|
|
|
|
|
|
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting
|
|
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library (EARL) is automatically
|
|
a job with either `srun`, `sbatch` or `mpirun`. The EAR Library (EARL) is automatically
|
... | @@ -22,39 +21,48 @@ The EAR development team had worked also with OAR and PBSPro batch schedulers, b |
... | @@ -22,39 +21,48 @@ The EAR development team had worked also with OAR and PBSPro batch schedulers, b |
|
|
|
|
|
# Use cases
|
|
# Use cases
|
|
|
|
|
|
Since EAR was targetting computational applications, some applications are automatically loaded and others are not to avoid running EAR with, por exampl, sh processes. Types of applications automatically loaded with EAR library are:
|
|
Since EAR was targetting computational applications, some applications are automatically loaded and others are not, avoiding running EAR with, for example, bash processes.
|
|
|
|
The following list resumes the application use cases where the EARL can be loaded transparently with them:
|
|
|
|
|
|
- MPI applications (intel, OpenMPI Fujitsu and CRAY versions)
|
|
- MPI applications: IntelMPI, OpenMPI, Fujitsu and CRAY versions.
|
|
- Not MPI: OpenMP, CUDA, MKL, OneAPI
|
|
- Non-MPI applications: OpenMP, CUDA, MKL and OneAPI.
|
|
- Python
|
|
- Python applications.
|
|
|
|
|
|
For other use cases it can explicitly requested, see (Other application types or frameworks)
|
|
Other use cases not listed here might be still supported.
|
|
|
|
See the [dedicated section](#other-application-types-or-frameworks).
|
|
|
|
|
|
## MPI applications
|
|
## MPI applications
|
|
|
|
|
|
EARL is automatically loaded with MPI applications when EAR is enabled by
|
|
EARL is automatically loaded with MPI applications when it is enabled by default (check `ear-info`).
|
|
default (check `ear-info`). EAR supports the utilization of both
|
|
EAR supports the utilization of both `mpirun`/`mpiexec` and `srun` commands.
|
|
`mpirun`/`mpiexec` and `srun` commands.
|
|
|
|
|
|
|
|
When using `sbacth`/`srun` or `salloc`, [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.mufipm)
|
|
When using `sbatch`/`srun` or `salloc`, [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.mufipm)
|
|
and [OpenMPI](https://www.open-mpi.org/) are fully supported.
|
|
and [OpenMPI](https://www.open-mpi.org/) are fully supported.
|
|
When using specific MPI flavour commands to start applications (e.g., `mpirun`,
|
|
When using specific MPI flavour commands to start applications (e.g., `mpirun`,
|
|
`mpiexec.hydra`), there are some keypoints which you must take account.
|
|
`mpiexec.hydra`), there are some keypoints which you must take account.
|
|
See [next sections](#using-mpirunmpiexec-command) for examples and more details.
|
|
See [next sections](#using-mpirunmpiexec-command) for examples and more details.
|
|
|
|
|
|
|
|
Review SLURM's [MPI Users Guide](https://slurm.schedmd.com/mpi_guide.html), read your cluster documentation or ask your system administrator to see how SLURM is integrated with the MPI Library in your system.
|
|
|
|
|
|
### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
### Hybrid MPI + (OpenMP, CUDA, MKL) applications
|
|
|
|
|
|
EARL automatically supports this use case.
|
|
EARL automatically supports this use case.
|
|
`mpirun`/`mpiexec` and `srun` are supported in the same manner as explained above.
|
|
`mpirun`/`mpiexec` and `srun` are supported in the same manner as explained above.
|
|
|
|
|
|
### Python MPI applications
|
|
### Python and Julia MPI applications
|
|
|
|
|
|
|
|
EARL cannot detect automatically MPI symbols when some of these languages is used.
|
|
|
|
On that case, an environment variable is provided to give EARL a hint of the MPI flavour being used.
|
|
|
|
|
|
EARL cannot detect automatically MPI symbols when Python is used.
|
|
Export [`EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_version) environment with the value from the following table depending on the MPI implementation you are loading:
|
|
On that case, an environment variable used to specify which MPI flavour is provided.
|
|
|
|
|
|
|
|
Export [`SLURM_EAR_LOAD_MPI_VERSION`](EAR-environment-variables#ear_load_mpi_version) environment variable with either _intel_ or _open mpi_
|
|
| MPI flavour | Value |
|
|
values, e.g., `export SLURM_EAR_LOAD_MPI_VERSION="open mpi"`, whose are the two MPI
|
|
| ----------- | ----- |
|
|
implementations 100% supported by EAR.
|
|
| [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) | _intel_ |
|
|
|
|
| [Open MPI](https://www.open-mpi.org/) | _open mpi_ or _ompi_ |
|
|
|
|
| [MVAPICH](https://mvapich.cse.ohio-state.edu/) | _mvapich_ |
|
|
|
|
| Fujitsu MPI | _fujitsu mpi_ |
|
|
|
|
| Cray MPICH | _cray mpich_ |
|
|
|
|
|
|
### Running MPI applications on SLURM systems
|
|
### Running MPI applications on SLURM systems
|
|
|
|
|
... | @@ -64,18 +72,17 @@ Running MPI applications with EARL on SLURM systems using `srun` command is the |
... | @@ -64,18 +72,17 @@ Running MPI applications with EARL on SLURM systems using `srun` command is the |
|
straightforward way to start using EAR.
|
|
straightforward way to start using EAR.
|
|
All jobs are monitored by EAR and the Library is loaded by default depending on
|
|
All jobs are monitored by EAR and the Library is loaded by default depending on
|
|
the cluster configuration.
|
|
the cluster configuration.
|
|
To run a job with `srun` and EARL **there is no need to load the EAR module**.
|
|
|
|
|
|
|
|
Even though it is automatic, there are few [flags](#ear-job-submission-flags) than can be selected
|
|
Even though it is automatic, there are few [flags](#ear-job-submission-flags) than can be selected at job submission.
|
|
at job submission.
|
|
They are provided by EAR's SLURM SPANK plug-in. When using SLURM commands for job submission, both Intel and OpenMPI implementations are supported.
|
|
They are provided by EAR's SLURM SPANK plug-in.
|
|
|
|
When using SLURM commands for job submission, both Intel and OpenMPI implementations are
|
|
**There is no need to load the EAR module** for running a job with `srun` and get EARL loaded.
|
|
supported.
|
|
Review SLURM's [MPI Users Guide](https://slurm.schedmd.com/mpi_guide.html), read your cluster documentation or ask your system administrator to see how SLURM is integrated with the MPI Library in your system.
|
|
|
|
|
|
#### Using `mpirun`/`mpiexec` command
|
|
#### Using `mpirun`/`mpiexec` command
|
|
|
|
|
|
To provide an automatic loading of the EAR library, the only requirement from
|
|
To provide an automatic loading of EARL, the only requirement from the MPI library is to be coordinated with the scheduler.
|
|
the MPI library is to be coordinated with the scheduler.
|
|
Review SLURM's [MPI Users Guide](https://slurm.schedmd.com/mpi_guide.html), read your cluster documentation or ask your system administrator to see how SLURM is integrated with the MPI Library in your system.
|
|
|
|
|
|
##### Intel MPI
|
|
##### Intel MPI
|
|
|
|
|
... | @@ -101,44 +108,46 @@ Read the corresponding [examples section](User-guide.md#openmpi-1) for more info |
... | @@ -101,44 +108,46 @@ Read the corresponding [examples section](User-guide.md#openmpi-1) for more info |
|
|
|
|
|
##### MPI4PY
|
|
##### MPI4PY
|
|
|
|
|
|
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify
|
|
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify the application as Intel or OpenMPI.
|
|
the application as Intel or OpenMPI. In order to specify it, the user has
|
|
In order to specify it, the user has to define the `EAR_LOAD_MPI_VERSION` environment variable with the values specified in the [table](#python-and-julia-mpi-applications) explained above.
|
|
to define the `SLURM_LOAD_MPI_VERSION` environment variable with the values _intel_ or
|
|
|
|
_open mpi_. It is recommended to add in Python modules to make it easy for
|
|
It is recommended to add in Python modules to make it easy for final users.
|
|
final users.
|
|
Ask your system administrator or check your cluster documentation.
|
|
|
|
|
|
|
|
##### MPI.jl
|
|
|
|
|
|
|
|
According to the [documentation](https://juliaparallel.org/MPI.jl/stable/), the basic Julia wrapper for MPI is inspired by mpi4py.
|
|
|
|
Check its [section](#mpi4py) for running this kind of use case.
|
|
|
|
|
|
## Non-MPI applications
|
|
## Non-MPI applications
|
|
|
|
|
|
### Python
|
|
### Python
|
|
|
|
|
|
Since version 4.1 EAR automatically executes the Library with Python applications,
|
|
Since version 4.1 EAR automatically executes the Library with Python applications, so no action is needed.
|
|
so no action is needed.
|
|
You must run the application with `srun` command to pass through the EAR's SLURM SPANK plug-in in order to enable/disable/tuning EAR.
|
|
You must run the application with `srun` command to pass through the EAR's SLURM
|
|
|
|
SPANK plug-in in order to enable/disable/tuning EAR.
|
|
|
|
See [EAR submission flags](#ear-job-submission-flags) provided by EAR SLURM integration.
|
|
See [EAR submission flags](#ear-job-submission-flags) provided by EAR SLURM integration.
|
|
|
|
|
|
### OpenMP, CUDA and Intel MKL
|
|
### OpenMP, CUDA, Intel MKL and OneAPI
|
|
|
|
|
|
To load EARL automatically with non-MPI applications it is required to have it compiled
|
|
To load EARL automatically with non-MPI applications it is required to have it compiled with dynamic symbols and also it must be executed with `srun` command.
|
|
with dynamic symbols and also it must be executed with `srun` command.
|
|
|
|
For example, for CUDA applications the `--cudart=shared` option must be used at compile time.
|
|
For example, for CUDA applications the `--cudart=shared` option must be used at compile time.
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
|
|
|
|
|
|
## Other application types or frameworks
|
|
## Other application types or frameworks
|
|
|
|
|
|
For other programming models or sequential apps not supported by default, EARL can
|
|
For other programming models or sequential apps not supported by default, EARL can
|
|
be forced to be loaded by setting [`SLURM_EAR_LOADER_APPLICATION`](EAR-environment-variables#ear_loader_application)
|
|
be forced to be loaded by setting [`EAR_LOADER_APPLICATION`](EAR-environment-variables#ear_loader_application)
|
|
enviroment variable, which must be defined with the application name.
|
|
enviroment variable, which must be defined with the executable name.
|
|
For example:
|
|
For example:
|
|
|
|
|
|
```
|
|
```
|
|
#!/bin/bash
|
|
#!/bin/bash
|
|
|
|
|
|
export SLURM_EAR_LOADER_APPLICATION=my_app
|
|
export EAR_LOADER_APPLICATION=my_app
|
|
srun my_app
|
|
srun my_app
|
|
```
|
|
```
|
|
|
|
|
|
## Using EAR inside Singularity containers
|
|
## Using EARL inside Singularity containers
|
|
|
|
|
|
[Apptainer](https://apptainer.org/) (formerly Singularity) is an open source technology for containerization.
|
|
[Apptainer](https://apptainer.org/) (formerly Singularity) is an open source technology for containerization.
|
|
It is widely used in HPC contexts because the level of virtualization it offers enables the access to local services.
|
|
It is widely used in HPC contexts because the level of virtualization it offers enables the access to local services.
|
... | @@ -200,55 +209,53 @@ get to see or know your applications behaviour. |
... | @@ -200,55 +209,53 @@ get to see or know your applications behaviour. |
|
The Library is doted with several modules and options to be able to provide different
|
|
The Library is doted with several modules and options to be able to provide different
|
|
kind of information.
|
|
kind of information.
|
|
|
|
|
|
As a very simple hint of your application workload, you can enable EARL verbosity
|
|
As a very simple hint of your application workload, you can enable EARL verbosity (e.g., `--ear-verbose=1`) to get loop data at runtime.
|
|
to get loop data at runtime.
|
|
|
|
The information is shown at _stderr_ by default.
|
|
The information is shown at _stderr_ by default.
|
|
Read how to set up verbosity at [submission time](ear-job-submission-flags) and
|
|
Read how to set up verbosity at [submission time](ear-job-submission-flags) and
|
|
[verbosity environment variables](EAR-environment-variables#verbosity) provided
|
|
[verbosity environment variables](EAR-environment-variables#verbosity) provided
|
|
for a more advanced tunning of this EAR feature.
|
|
for a more advanced tunning of this EAR feature.
|
|
|
|
|
|
To get offline job data EAR provides [`eacct`](EAR-commands#ear-job-accounting-eacct),
|
|
## Post-mortem application data
|
|
a tool to provide the monitored job data stored in the Database.
|
|
|
|
You can request information in different ways, so you can read aggregated job data,
|
|
To get offline job data EAR provides [**eacct**](EAR-commands#ear-job-accounting-eacct) command, a tool to provide the monitored job data stored in the Database.
|
|
per-node or per-loop information among other things.
|
|
You can request information in different ways, so you can read aggregated job data, per-node or per-loop information among other things.
|
|
See [eacct usage examples](User-guide#ear-job-accounting-eacct) for a better overview of which kind of data `eacct` provides.
|
|
See [eacct usage examples](User-guide#ear-job-accounting-eacct) for a better overview of what `eacct` provides.
|
|
|
|
|
|
There is another way to get runtime and aggregated data during runtime without the
|
|
## Runtime report plug-ins
|
|
need of calling `eacct` after the job completion.
|
|
|
|
EAR implements a reporting system mechanism which let developers to add new report
|
|
|
|
plug-ins, so there is an infinit set of ways to report EAR collected data.
|
|
|
|
|
|
|
|
Therefore EAR releases come with a fully supported report plug-in (called *csv_ts*)
|
|
There is another way to get runtime and aggregated data during runtime without the need of calling `eacct` after the job completion.
|
|
which basically provides the same runtime and aggregated data reported to the Database in CSV files,
|
|
EAR implements a reporting system mechanism which let developers to add new report plug-ins, so there is an unlimited set of ways to report EAR collected data.
|
|
directly while the job is running.
|
|
EAR releases come with a fully supported report plug-in (i.e., *csv_ts.so*) which provides the same runtime and aggregated data reported to the Database in CSV files, directly while the job is running.
|
|
You can load this plug-in in two ways:
|
|
You can load this plug-in in two ways:
|
|
|
|
|
|
1. By setting [`--ear-user-db`](#ear-job-submission-flags) flag at submission time.
|
|
1. By setting [`--ear-user-db`](#ear-job-submission-flags) flag at submission time.
|
|
2. [Loading directly the report plug-in](EAR-environment-variables#ear_report_add) through an environment variable:
|
|
2. [Loading directly the report plug-in](EAR-environment-variables#ear_report_add) through an environment variable: `export EAR_REPORT_ADD=csv_ts.so`.
|
|
`export SLURM_EAR_REPORT_ADD=csv_ts.so`.
|
|
|
|
|
|
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) for more information
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) for more information
|
|
about report plug-ins.
|
|
about report plug-ins.
|
|
|
|
|
|
|
|
## Other EARL events
|
|
|
|
|
|
You can also request EAR to report **events** to the [Database](EAR-Databse).
|
|
You can also request EAR to report **events** to the [Database](EAR-Databse).
|
|
They show more details about EARL internal state and can be provided with `eacct`
|
|
They show more details about EARL internal state and can be provided with `eacct` command.
|
|
command.
|
|
See how to enable [EAR events reporting](EAR-environment-variables#report_earl_events) and which kind of events EAR is reporting.
|
|
See how to enable [EAR events reporting](EAR-environment-variables#report_earl_events)
|
|
|
|
and which kind of events EAR is reporting.
|
|
|
|
|
|
|
|
If your application applies, you can request EAR to report at the
|
|
## MPI stats
|
|
end of the execution a [summary about its MPI behaviour](EAR-environment-variables#ear_get_mpi_stats).
|
|
|
|
|
|
If your application applies, you can request EAR to report at the end of the execution a [summary about its MPI behaviour](EAR-environment-variables#ear_get_mpi_stats).
|
|
The information is provided along two files and is the aggregated data of each process of the application.
|
|
The information is provided along two files and is the aggregated data of each process of the application.
|
|
|
|
|
|
|
|
## Paraver traces
|
|
|
|
|
|
Finally, EARL can provide runtime data in the [Paraver](https://tools.bsc.es/paraver) trace format.
|
|
Finally, EARL can provide runtime data in the [Paraver](https://tools.bsc.es/paraver) trace format.
|
|
Paraver is a flexible performance analysis tool maintained by the [*Barcelona Supercomputing Center*](https://bsc.es/)'s tools team.
|
|
Paraver is a flexible performance analysis tool maintained by the [*Barcelona Supercomputing Center*](https://www.bsc.es/)'s tools team.
|
|
This tool provides an easy way to visualize runtime data, computing derived metrics
|
|
This tool provides an easy way to visualize runtime data, computing derived metrics and to provide histograms for better of your application behaviour.
|
|
and to provide histograms for better of your application behaviour.
|
|
See on the [environment variables page](EAR-environment-variables#ear_trace_plugin) how to generate Paraver traces.
|
|
See on the [environment variables page](EAR-environment-variables#ear_trace_plugin)
|
|
|
|
how to generate Paraver traces.
|
|
|
|
|
|
|
|
> Contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want to get more details
|
|
Another way to see runtime information with Paraver is to use the open source tool [**ear-job-visualization**](https://github.com/eas4dc/ear-job-visualization), a CLI program written in Python which gets CSV files generated by `--ear-user-db` flag and converts its data to the Paraver trace format.
|
|
about how to deal with EAR data with Paraver.
|
|
EAR metrics are reported as trace events.
|
|
|
|
Node information is stored as Paraver task information.
|
|
|
|
Node GPU data is stored as Paraver thread information
|
|
|
|
|
|
# EAR job submission flags
|
|
# EAR job submission flags
|
|
|
|
|
... | @@ -274,7 +281,7 @@ We recommend to split up SLURM's output (or error) file per-node. |
... | @@ -274,7 +281,7 @@ We recommend to split up SLURM's output (or error) file per-node. |
|
You can read SLURM's [filename pattern specification](https://slurm.schedmd.com/sbatch.html#lbAH) for more information.
|
|
You can read SLURM's [filename pattern specification](https://slurm.schedmd.com/sbatch.html#lbAH) for more information.
|
|
|
|
|
|
If you still need to have job output and EAR output separated, you can set
|
|
If you still need to have job output and EAR output separated, you can set
|
|
[`SLURM_EARL_VERBOSE_PATH`](EAR-environment-variables#earl_verbose_path) environment
|
|
[`EARL_VERBOSE_PATH`](EAR-environment-variables#earl_verbose_path) environment
|
|
variable and one file per node will be generated only with EAR output.
|
|
variable and one file per node will be generated only with EAR output.
|
|
The environemnt variable must be set with the path (a directory) where you want
|
|
The environemnt variable must be set with the path (a directory) where you want
|
|
the output files to be generated, it will be automatically created if needed.
|
|
the output files to be generated, it will be automatically created if needed.
|
... | | ... | |