|
|
[[_TOC_]]
|
|
|
|
|
|
## Introduction
|
|
|
# Introduction
|
|
|
|
|
|
EAR offers some environment variables in order to provide users the opportunity to
|
|
|
tune or request some of EAR features.
|
... | ... | @@ -18,9 +18,9 @@ This design may only have a real effect on SLURM systems, but it makes it easier |
|
|
|
|
|
All examples showing the usage of below environment variables assume a system using SLURM.
|
|
|
|
|
|
## Loading EAR Library
|
|
|
# Loading EAR Library
|
|
|
|
|
|
### EAR\_LOADER\_APPLICATION
|
|
|
## EAR\_LOADER\_APPLICATION
|
|
|
|
|
|
Rules the EAR Loader to load the EAR Library for a specific application that does not follow any of the current programming models (or maybe a sequential app) supported by EAR.
|
|
|
Your system must have installed the non-MPI version of the Library (ask your system administrator).
|
... | ... | @@ -31,14 +31,14 @@ If you don’t provide it, the EAR Loader will compare it against the executable |
|
|
```
|
|
|
#!/bin/bash
|
|
|
|
|
|
export SLURM_EAR_LOADER_APPLICATION=my_job_name
|
|
|
export EAR_LOADER_APPLICATION=my_job_name
|
|
|
|
|
|
srun --ntasks 1 --job-name=my_job_name ./my_exec_file
|
|
|
```
|
|
|
|
|
|
See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR.
|
|
|
|
|
|
### EAR\_LOAD\_MPI\_VERSION
|
|
|
## EAR\_LOAD\_MPI\_VERSION
|
|
|
|
|
|
Forces to load a specific MPI version of the EAR Library.
|
|
|
This is needed, for example, when you want to load the EAR Library for Python + MPI applications, where the Loader is not able to detect the MPI implementation the application is going to use.
|
... | ... | @@ -57,7 +57,7 @@ It can be downloaded from Tensorflow benchmarks [repository](https://github.com/ |
|
|
# Specific modules here
|
|
|
# ...
|
|
|
|
|
|
export SLURM_EAR_LOAD_MPI_VERSION="open mpi"
|
|
|
export EAR_LOAD_MPI_VERSION="open mpi"
|
|
|
|
|
|
srun --ear-policy=min_time \
|
|
|
python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
|
... | ... | @@ -66,9 +66,9 @@ srun --ear-policy=min_time \ |
|
|
|
|
|
See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR.
|
|
|
|
|
|
## Report plug-ins
|
|
|
# Report plug-ins
|
|
|
|
|
|
### EAR\_REPORT\_ADD
|
|
|
## EAR\_REPORT\_ADD
|
|
|
|
|
|
Specify a report plug-in to be loaded. The value must be a shared object file, and it must be located at `$EAR_INSTALL_PATH/lib/plugins/report` or at the path from where the job was launched.
|
|
|
Alternatively, you can provide the full path (absolute or relative) of the report plug-in.
|
... | ... | @@ -76,14 +76,14 @@ Alternatively, you can provide the full path (absolute or relative) of the repor |
|
|
```
|
|
|
#!/bin/bash
|
|
|
|
|
|
export SLURM_EAR_REPORT_ADD=my_report_plugin.so
|
|
|
export EAR_REPORT_ADD=my_report_plugin.so:my_report_plugin2.so
|
|
|
|
|
|
srun -n 10 my_mpi_app
|
|
|
```
|
|
|
|
|
|
## Verbosity
|
|
|
# Verbosity
|
|
|
|
|
|
### EARL\_VERBOSE\_PATH
|
|
|
## EARL\_VERBOSE\_PATH
|
|
|
|
|
|
Specify a path to create a file (one per node involved in a job) where to print messages from the EAR Library.
|
|
|
This is useful when you run a job in multiple nodes, as EAR verbose information for each of them can result in lots of messages mixed at stderr (EAR messages default channel).
|
... | ... | @@ -102,7 +102,7 @@ Finally, the *job_step* and *job_id* are fields showing information about the jo |
|
|
#SBATCH -N 2
|
|
|
#SBATCH -n 96
|
|
|
|
|
|
export SLURM_EARL_VERBOSE_PATH=ear_logs_dir_name
|
|
|
export EARL_VERBOSE_PATH=ear_logs_dir_name
|
|
|
export I_MPI_HYDRA_BOOTSTRAP=slurm
|
|
|
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=”--ear-verbose=1”
|
|
|
|
... | ... | @@ -111,9 +111,9 @@ mpirun -np 96 -ppn 48 my_app |
|
|
|
|
|
After the above job example completion, in the same directory where the application was submitted, there will be a directory called *ear_logs_dir_name* with two files, i.e., one for each node, called *earl_logs.0.0.<job_step>.<job_id>* and *earl_logs.1.0.<job_step>.<job_id>*, respectively.
|
|
|
|
|
|
## Frequency management
|
|
|
# Frequency management
|
|
|
|
|
|
### EAR\_GPU\_DEF\_FREQ
|
|
|
## EAR\_GPU\_DEF\_FREQ
|
|
|
|
|
|
Set a GPU frequency (in kHz) to be fixed while your job is running.
|
|
|
The same frequency is set for all GPUs used by the job.
|
... | ... | @@ -131,13 +131,13 @@ input_path=/hpc/appl/biology/GROMACS/examples |
|
|
input_file=ion_channel.tpr
|
|
|
GROMACS_INPUT=$input_path/$input_file
|
|
|
|
|
|
export SLURM_EAR_GPU_DEF_FREQ=1440000
|
|
|
export EAR_GPU_DEF_FREQ=1440000
|
|
|
|
|
|
srun --cpu-bind=core --ear-policy=min_energy gmx_mpi mdrun \
|
|
|
-s $GROMACS_INPUT -noconfout -ntomp 1
|
|
|
```
|
|
|
|
|
|
### EAR\_JOB\_EXCLUSIVE\_MODE
|
|
|
## EAR\_JOB\_EXCLUSIVE\_MODE
|
|
|
|
|
|
Indicate whether the job will run in a node exclusively (non-zero value).
|
|
|
EAR will reduce the CPU frequency of those cores not used by the job.
|
... | ... | @@ -150,12 +150,12 @@ This feature explodes a very easy vector of power saving. |
|
|
#SBATCH --cpus-per-task=2
|
|
|
#SBATCH --exclusive
|
|
|
|
|
|
export SLURM_EAR_JOB_EXCLUSIVE_MODE=1
|
|
|
export EAR_JOB_EXCLUSIVE_MODE=1
|
|
|
|
|
|
srun -n 10 --ear=on ./mpi_mpi_app
|
|
|
```
|
|
|
|
|
|
### Controlling Uncore/Infinity Fabric frequency
|
|
|
## Controlling Uncore/Infinity Fabric frequency
|
|
|
|
|
|
EARL offers the possibility to control the Integrated Memory Controller (IMC) for Intel(R)
|
|
|
architectures and Infinity Fabric (IF) for AMD architectures.
|
... | ... | @@ -163,7 +163,7 @@ On this page we will use the term *uncore* to refer both of them. |
|
|
Environment variables related to uncore control covers [policy specific settings](#ear_set_imcfreq) or
|
|
|
the chance for a user to [fix it](#ear_max_imcfreq-and-ear_min_imcfreq) during an entire job.
|
|
|
|
|
|
#### EAR\_SET\_IMCFREQ
|
|
|
### EAR\_SET\_IMCFREQ
|
|
|
|
|
|
Enables/disables EAR's [eUFS](Home#publications) feature.
|
|
|
Type `ear-info` to see whehter eUFS is enabled by default.
|
... | ... | @@ -185,7 +185,7 @@ export SLURM_EAR_POLICY_IMC_TH=0.035 |
|
|
srun [...] my_app
|
|
|
```
|
|
|
|
|
|
#### EAR\_MAX\_IMCFREQ and EAR\_MIN\_IMCFREQ
|
|
|
### EAR\_MAX\_IMCFREQ and EAR\_MIN\_IMCFREQ
|
|
|
|
|
|
Set the maximum and minimum values (in kHz) at which *uncore* frequency should be.
|
|
|
Two variables were designed because Intel(R) architectures let to set a range of
|
... | ... | @@ -198,14 +198,14 @@ Below example shows a job execution fixing the uncore frequency at 2.0GHz: |
|
|
#!/bin/bash
|
|
|
...
|
|
|
|
|
|
export SLURM_EAR_MAX_IMCFREQ=2000000
|
|
|
export SLURM_EAR_MIN_IMCFREQ=2000000
|
|
|
export EAR_MAX_IMCFREQ=2000000
|
|
|
export EAR_MIN_IMCFREQ=2000000
|
|
|
...
|
|
|
|
|
|
srun [...] my_app
|
|
|
```
|
|
|
|
|
|
### Load Balancing
|
|
|
## Load Balancing
|
|
|
|
|
|
By default, EAR policies try to set the best CPU (and uncore, if [enabled](#controlling-uncore-infinity-fabric-frequency)) frequency according to node grain metrics.
|
|
|
This behaviour can be changed telling EAR to detect and deal with unbalanced workloads, i.e., there is no equity between processes regarding their MPI/computational activity.
|
... | ... | @@ -216,7 +216,7 @@ Please, contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want |
|
|
|
|
|
> A correct CPU binding it's required to get the most benefit of this feature. Check the documentation of your application programming model/vendor/flavour or yur system batch scheduler.
|
|
|
|
|
|
#### EAR\_LOAD\_BALANCE
|
|
|
### EAR\_LOAD\_BALANCE
|
|
|
|
|
|
Enables/Disables EAR's Load Balance strategy in energy policies.
|
|
|
Type `ear-info` to see whether this feature is enabled by default.
|
... | ... | @@ -231,14 +231,14 @@ per-process CPU frequency selection, you can increase the load balance threshold |
|
|
#!/bin/bash
|
|
|
...
|
|
|
|
|
|
export SLURM_EAR_LOAD_BALANCE=1
|
|
|
export SLURM_EAR_LOAD_BALANCE_TH=0.89
|
|
|
export EAR_LOAD_BALANCE=1
|
|
|
export EAR_LOAD_BALANCE_TH=0.89
|
|
|
...
|
|
|
|
|
|
srun [...] my_app
|
|
|
```
|
|
|
|
|
|
### Support for Intel(R) Speed Select Technology
|
|
|
## Support for Intel(R) Speed Select Technology
|
|
|
|
|
|
Since version 4.2, EAR supports the interaction with [Intel(R) Speed Select Technology (Intel(R) SST)](https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html)
|
|
|
which lets the user to have more fine grained control over per-CPU Turbo frequency.
|
... | ... | @@ -263,7 +263,7 @@ If you enable [EARL verbosity](User-guide#ear-job-submission-flags) you will see |
|
|
the mapping of the CLOS set for each CPU in the node.
|
|
|
Note that a `-1` value means that no change was done on the specific CPU.
|
|
|
|
|
|
#### EAR\_PRIO\_TASKS
|
|
|
### EAR\_PRIO\_TASKS
|
|
|
|
|
|
A list that specifies the CLOS that CPUs assigned to tasks must be set.
|
|
|
This variable is useful because you can configure your application transparently
|
... | ... | @@ -281,7 +281,7 @@ Below could be a (simplified) batch script that submits this example: |
|
|
```
|
|
|
#!/bin/bash
|
|
|
|
|
|
export SLURM_EAR_PRIO_TASKS=0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
|
|
|
export EAR_PRIO_TASKS=0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
|
|
|
|
|
|
srun --ntasks=16 --cpu-bind=core,verbose --ear-policy=monitoring --ear-cpufreq=2201000 --ear-verbose=1 bin/bt.C.x
|
|
|
```
|
... | ... | @@ -341,7 +341,7 @@ PRIO3: MAX GHZ - 0.0 GHz (low) |
|
|
[120,-1] [121,-1] [122,-1] [123,-1] [124,-1] [125,-1] [126,-1] [127,-1]
|
|
|
```
|
|
|
|
|
|
#### EAR\_PRIO\_CPUS
|
|
|
### EAR\_PRIO\_CPUS
|
|
|
|
|
|
A list of priorities that should have the same length as the number of CPUs your
|
|
|
job is using.
|
... | ... | @@ -355,22 +355,49 @@ But it becomes more flexible when the user has more control over the affinity se |
|
|
to its application, because you can discriminate between different CPUs assigned to the same task.
|
|
|
Moreover, this is the only way to set different priorities over different threads in no-MPI applications.
|
|
|
|
|
|
### EAR\_MIN\_CPUFREQ
|
|
|
## EAR\_MIN\_CPUFREQ
|
|
|
|
|
|
This variable can only be set by **authorized users**, and modifies the minimum CPU frequency the EAR Library can set.
|
|
|
The [EAR configuration](Configuration) file has a field called `cpu_max_pstate` which sets this limits on the tag it is configured.
|
|
|
Authorized users can modify this limit at submission time by using this environment to test, for example, the best value for the `ear.conf` field.
|
|
|
|
|
|
### Disabling EAR's affinity masks usage
|
|
|
## Disabling EAR's affinity masks usage
|
|
|
|
|
|
For both [Load Balancing](load-balancing) and [Intel(R) SST](#support-for-intel-r-speed-select-technology)
|
|
|
support, EAR uses processes' affinity mask read at the beginning of the job.
|
|
|
If you are working on an application that changes (or may change) the affinty mask of tasks, this can lead some miss configuration not detected by EAR.
|
|
|
To avoid any unexpected problem, **we highly recommend you** to export `EAR_NO_AFFINITY_MASK` environment variable **even you are not planning to work with some of the mentioned features**.
|
|
|
If you are working on an application that changes (or may change) the affinty mask of tasks dynamically, this can lead some miss configuration not detected by EAR.
|
|
|
To avoid any unexpected problem, **we highly recommend you** to export `EARL_NO_AFFINITY_MASK` environment variable **even you are not planning to work with some of the mentioned features**.
|
|
|
|
|
|
## Data gathering
|
|
|
Note: Since EAR version 5.0, EAR updates the process mask periodically (aprox 1 sec.) and always before applying the optimization policy.
|
|
|
|
|
|
### EAR\_GET\_MPI\_STATS
|
|
|
# Workflow support
|
|
|
|
|
|
## EAR\_DISABLE\_NODE\_METRICS
|
|
|
|
|
|
By defining this environment variable, the user or workflow manager indicates EAR the current process must not be considered as power consumer, not affecting the CPU power models used to estimate the amount of power corresponding to each application sharing a node. This env variable target master-worker scenarios (or map-reduce) when one process is not doing computational work, just working as master creating and waiting for processes. By specifying this var, the EARL ignores the affinity mas for this process and assumes its activity is not relevant for the whole power consumption. The value is not relevant, it only has to be defined. In a fork-join program (or similar, it has to be unset before the creation of the workers.
|
|
|
|
|
|
## EAR\_NTASK\_WORK\_SHARING
|
|
|
|
|
|
By defining this environment variable, the user indicates the library the set of processes sharing the node are in fact a single application (not MPI).
|
|
|
This enables a synchronization at the beginning and all the processes with same jobid and stepid (or similar for other schedulers different than SLURM) works together.
|
|
|
Only one of the will be selected as the master and will apply the energy policy. For GPU applications it's mandatory the process can access all the GPUs.
|
|
|
Otherwise, it is not recommended and each process will apply its own optimization.
|
|
|
The value is not relevant, it only has to be defined.
|
|
|
**This feature is only supported on systems using SLURM**.
|
|
|
|
|
|
|
|
|
# Data gathering/reporting
|
|
|
|
|
|
## EARL\_REPORT\_LOOPS
|
|
|
|
|
|
Since **version 4.3**, EAR can be configured to not report application loop signatures by default.
|
|
|
This configuration satisfy a constraint for many HPC data centers where hundreds of jobs are launched daily, leading to too many loops reported and a quick EAR database size increase.
|
|
|
|
|
|
For those users which still want to get application loop data, this variable can be set to one (i.e., `export EARL_REPORT_LOOPS=1`) to force EAR report their application loop signatures.
|
|
|
Therefore, users can get their loop data by calling [`eacct -j <job_id> -r`](EAR-commands#ear-job-accounting-eacct).
|
|
|
|
|
|
|
|
|
## EAR\_GET\_MPI\_STATS
|
|
|
|
|
|
Use this variable to generate two files at the end of the job execution that will contain global, per process MPI information.
|
|
|
You must specify the prefix (optionally with a path) of the filename. One file (*[path/]prefix.ear_mpi_stats.full_nodename.csv*) will contain a resume about MPI throughput (per-process), while the other one (*[path/]prefix.ear_mpi_calls_stats.full_nodename.csv*) will contain a more fine grained information about different MPI call types.
|
... | ... | @@ -385,7 +412,7 @@ Here is an example: |
|
|
MPI_INFO_DST=$SLURM_JOBID-mpi_stats
|
|
|
mkdir $MPI_INFO_DST
|
|
|
|
|
|
export SLURM_EAR_GET_MPI_STATS=$MPI_INFO_DST/$SLURM_JOB_NAME
|
|
|
export EAR_GET_MPI_STATS=$MPI_INFO_DST/$SLURM_JOB_NAME
|
|
|
|
|
|
srun -n 48 --ear=on ./mpi_app
|
|
|
```
|
... | ... | @@ -447,7 +474,7 @@ Below table shows fields available by **ear_mpi_calls_stats** file: |
|
|
| t_SendRecv | Time (in microseconds) spent in **blocking** SendRecv calls.
|
|
|
| t_Wait | Time (in microseconds) spent in **blocking** Wait calls.
|
|
|
|
|
|
### EAR_TRACE_PLUGIN
|
|
|
## EAR_TRACE_PLUGIN
|
|
|
|
|
|
EAR offers the chance to generate Paraver traces to visualize runtime metrics with the [Paraver tool](https://tools.bsc.es/paraver).
|
|
|
Paraver is a visualization tool developed by CEPBA-Tools team and currently maintained by the Barcelona Supercomputing Center’s tools team.
|
... | ... | @@ -457,7 +484,7 @@ You must set the value of this variable to `tracer_paraver.so` to load the trace |
|
|
This shared object comes with the official EAR distribution and it is located at `$EAR_INSTALL_PATH/lib/plugins/tracer`.
|
|
|
Then you need to set the `EAR_TRACE_PATH` variable (see below) to specify the destination path of the generated Paraver traces.
|
|
|
|
|
|
### EAR_TRACE_PATH
|
|
|
## EAR_TRACE_PATH
|
|
|
|
|
|
Specify the path where you want to store the trace files generated by the EAR Library. The path must be fully created. Otherwise, the Paraver tracer plug-in won’t be loaded.
|
|
|
|
... | ... | @@ -467,17 +494,17 @@ Here is an example of the usage of the above explained environment variables: |
|
|
#!/bin/bash
|
|
|
...
|
|
|
|
|
|
export SLURM_EAR_TRACE_PLUGIN=tracer_paraver.so
|
|
|
export SLURM_EAR_TRACE_PATH=$(pwd)/traces
|
|
|
mkdir -p $SLURM_EAR_TRACE_PATH
|
|
|
export EAR_TRACE_PLUGIN=tracer_paraver.so
|
|
|
export EAR_TRACE_PATH=$(pwd)/traces
|
|
|
mkdir -p $EAR_TRACE_PATH
|
|
|
|
|
|
srun -n 10 --ear=on ./mpi_app
|
|
|
|
|
|
```
|
|
|
|
|
|
### REPORT_EARL_EVENTS
|
|
|
## REPORT_EARL_EVENTS
|
|
|
|
|
|
Use this variable (i.e., `export SLURM_REPORT_EARL_EVENTS=1`) to make EARL send internal events to the [Database](EAR-Database).
|
|
|
Use this variable (i.e., `export REPORT_EARL_EVENTS=1`) to make EARL send internal events to the [Database](EAR-Database).
|
|
|
These events are useful to have more information about Library's behaviour, like
|
|
|
when DynAIS **(REFERENCE DYNAIS)** is turned off, the computational phase EAR is guessing the application is on
|
|
|
or the status of the applied policy **(REF POLICIES)**.
|
... | ... | @@ -493,7 +520,7 @@ a table of all reported events: |
|
|
| Value | The value stored with the event. Categorical events explained below. |
|
|
|
| node_id | The node from where the event was reported. |
|
|
|
|
|
|
#### Event types
|
|
|
### Event types
|
|
|
|
|
|
Below are listed all kind of event types you can get when requesting job events.
|
|
|
For categorical event values, the (value, category) mapping is explained.
|
... | ... | |