Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • EAR EAR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • EAR_teamEAR_team
  • EAREAR
  • Wiki
  • EAR environment variables

EAR environment variables · Changes

Page history
Wiki EAR4.3 authored Jul 04, 2023 by Oriol Vidal's avatar Oriol Vidal
Hide whitespace changes
Inline Side-by-side
EAR-environment-variables.md
View page @ 7ff3e622
[[_TOC_]] [[_TOC_]]
# Introduction ## Introduction
EAR offers some environment variables in order to provide users the opportunity to EAR offers some environment variables in order to provide users the opportunity to
tune or request some of EAR features. tune or request some of EAR features.
...@@ -18,9 +18,9 @@ This design may only have a real effect on SLURM systems, but it makes it easier ...@@ -18,9 +18,9 @@ This design may only have a real effect on SLURM systems, but it makes it easier
All examples showing the usage of below environment variables assume a system using SLURM. All examples showing the usage of below environment variables assume a system using SLURM.
# Loading EAR Library ## Loading EAR Library
## EAR_LOADER_APPLICATION ### EAR_LOADER_APPLICATION
Rules the EAR Loader to load the EAR Library for a specific application that does not follow any of the current programming models (or maybe a sequential app) supported by EAR. Rules the EAR Loader to load the EAR Library for a specific application that does not follow any of the current programming models (or maybe a sequential app) supported by EAR.
Your system must have installed the non-MPI version of the Library (ask your system administrator). Your system must have installed the non-MPI version of the Library (ask your system administrator).
...@@ -38,7 +38,7 @@ srun --ntasks 1 --job-name=my_job_name ./my_exec_file ...@@ -38,7 +38,7 @@ srun --ntasks 1 --job-name=my_job_name ./my_exec_file
See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR. See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR.
## EAR_LOAD_MPI_VERSION ### EAR_LOAD_MPI_VERSION
Forces to load a specific MPI version of the EAR Library. Forces to load a specific MPI version of the EAR Library.
This is needed, for example, when you want to load the EAR Library for Python + MPI applications, where the Loader is not able to detect the MPI implementation the application is going to use. This is needed, for example, when you want to load the EAR Library for Python + MPI applications, where the Loader is not able to detect the MPI implementation the application is going to use.
...@@ -66,9 +66,9 @@ srun --ear-policy=min_time \ ...@@ -66,9 +66,9 @@ srun --ear-policy=min_time \
See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR. See the [Use cases](User-guide#use-cases) section to read more information about how to run jobs with EAR.
# Report plug-ins ## Report plug-ins
## EAR_REPORT_ADD ### EAR_REPORT_ADD
Specify a report plug-in to be loaded. The value must be a shared object file, and it must be located at `$EAR_INSTALL_PATH/lib/plugins/report` or at the path from where the job was launched. Specify a report plug-in to be loaded. The value must be a shared object file, and it must be located at `$EAR_INSTALL_PATH/lib/plugins/report` or at the path from where the job was launched.
Alternatively, you can provide the full path (absolute or relative) of the report plug-in. Alternatively, you can provide the full path (absolute or relative) of the report plug-in.
...@@ -81,9 +81,9 @@ export SLURM_EAR_REPORT_ADD=my_report_plugin.so ...@@ -81,9 +81,9 @@ export SLURM_EAR_REPORT_ADD=my_report_plugin.so
srun -n 10 my_mpi_app srun -n 10 my_mpi_app
``` ```
# Verbosity ## Verbosity
## EARL_VERBOSE_PATH ### EARL_VERBOSE_PATH
Specify a path to create a file (one per node involved in a job) where to print messages from the EAR Library. Specify a path to create a file (one per node involved in a job) where to print messages from the EAR Library.
This is useful when you run a job in multiple nodes, as EAR verbose information for each of them can result in lots of messages mixed at stderr (EAR messages default channel). This is useful when you run a job in multiple nodes, as EAR verbose information for each of them can result in lots of messages mixed at stderr (EAR messages default channel).
...@@ -111,9 +111,9 @@ mpirun -np 96 -ppn 48 my_app ...@@ -111,9 +111,9 @@ mpirun -np 96 -ppn 48 my_app
After the above job example completion, in the same directory where the application was submitted, there will be a directory called *ear_logs_dir_name* with two files, i.e., one for each node, called *earl_logs.0.0.<job_step>.<job_id>* and *earl_logs.1.0.<job_step>.<job_id>*, respectively. After the above job example completion, in the same directory where the application was submitted, there will be a directory called *ear_logs_dir_name* with two files, i.e., one for each node, called *earl_logs.0.0.<job_step>.<job_id>* and *earl_logs.1.0.<job_step>.<job_id>*, respectively.
# Frequency management ## Frequency management
## EAR_GPU_DEF_FREQ ### EAR_GPU_DEF_FREQ
Set a GPU frequency (in kHz) to be fixed while your job is running. Set a GPU frequency (in kHz) to be fixed while your job is running.
The same frequency is set for all GPUs used by the job. The same frequency is set for all GPUs used by the job.
...@@ -137,7 +137,7 @@ srun --cpu-bind=core --ear-policy=min_energy gmx_mpi mdrun \ ...@@ -137,7 +137,7 @@ srun --cpu-bind=core --ear-policy=min_energy gmx_mpi mdrun \
-s $GROMACS_INPUT -noconfout -ntomp 1 -s $GROMACS_INPUT -noconfout -ntomp 1
``` ```
## EAR_JOB_EXCLUSIVE_MODE ### EAR_JOB_EXCLUSIVE_MODE
Indicate whether the job will run in a node exclusively (non-zero value). Indicate whether the job will run in a node exclusively (non-zero value).
EAR will reduce the CPU frequency of those cores not used by the job. EAR will reduce the CPU frequency of those cores not used by the job.
...@@ -155,7 +155,7 @@ export SLURM_EAR_JOB_EXCLUSIVE_MODE=1 ...@@ -155,7 +155,7 @@ export SLURM_EAR_JOB_EXCLUSIVE_MODE=1
srun -n 10 --ear=on ./mpi_mpi_app srun -n 10 --ear=on ./mpi_mpi_app
``` ```
## Controlling Uncore/Infinity Fabric frequency ### Controlling Uncore/Infinity Fabric frequency
EARL offers the possibility to control the Integrated Memory Controller (IMC) for Intel(R) EARL offers the possibility to control the Integrated Memory Controller (IMC) for Intel(R)
architectures and Infinity Fabric (IF) for AMD architectures. architectures and Infinity Fabric (IF) for AMD architectures.
...@@ -163,7 +163,7 @@ On this page we will use the term *uncore* to refer both of them. ...@@ -163,7 +163,7 @@ On this page we will use the term *uncore* to refer both of them.
Environment variables related to uncore control covers [policy specific settings](#ear_set_imcfreq) or Environment variables related to uncore control covers [policy specific settings](#ear_set_imcfreq) or
the chance for a user to [fix it](#ear_max_imcfreq-and-ear_min_imcfreq) during an entire job. the chance for a user to [fix it](#ear_max_imcfreq-and-ear_min_imcfreq) during an entire job.
### EAR_SET_IMCFREQ #### EAR_SET_IMCFREQ
Enables/disables EAR's [eUFS](Home#publications) feature. Enables/disables EAR's [eUFS](Home#publications) feature.
Type `ear-info` to see whehter eUFS is enabled by default. Type `ear-info` to see whehter eUFS is enabled by default.
...@@ -185,7 +185,7 @@ export SLURM_EAR_POLICY_IMC_TH=0.035 ...@@ -185,7 +185,7 @@ export SLURM_EAR_POLICY_IMC_TH=0.035
srun [...] my_app srun [...] my_app
``` ```
### EAR_MAX_IMCFREQ and EAR_MIN_IMCFREQ #### EAR_MAX_IMCFREQ and EAR_MIN_IMCFREQ
Set the maximum and minimum values (in kHz) at which *uncore* frequency should be. Set the maximum and minimum values (in kHz) at which *uncore* frequency should be.
Two variables were designed because Intel(R) architectures let to set a range of Two variables were designed because Intel(R) architectures let to set a range of
...@@ -205,7 +205,7 @@ export SLURM_EAR_MIN_IMCFREQ=2000000 ...@@ -205,7 +205,7 @@ export SLURM_EAR_MIN_IMCFREQ=2000000
srun [...] my_app srun [...] my_app
``` ```
## Load Balancing ### Load Balancing
By default, EAR policies try to set the best CPU (and uncore, if [enabled](#controlling-uncore-infinity-fabric-frequency)) frequency according to node grain metrics. By default, EAR policies try to set the best CPU (and uncore, if [enabled](#controlling-uncore-infinity-fabric-frequency)) frequency according to node grain metrics.
This behaviour can be changed telling EAR to detect and deal with unbalanced workloads, i.e., there is no equity between processes regarding their MPI/computational activity. This behaviour can be changed telling EAR to detect and deal with unbalanced workloads, i.e., there is no equity between processes regarding their MPI/computational activity.
...@@ -216,7 +216,7 @@ Please, contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want ...@@ -216,7 +216,7 @@ Please, contact with [ear-support@bsc.es](mailto:ear-support@bsc.es) if you want
> A correct CPU binding it's required to get the most benefit of this feature. Check the documentation of your application programming model/vendor/flavour or yur system batch scheduler. > A correct CPU binding it's required to get the most benefit of this feature. Check the documentation of your application programming model/vendor/flavour or yur system batch scheduler.
### EAR_LOAD_BALANCE #### EAR_LOAD_BALANCE
Enables/Disables EAR's Load Balance strategy in energy policies. Enables/Disables EAR's Load Balance strategy in energy policies.
Type `ear-info` to see whether this feature is enabled by default. Type `ear-info` to see whether this feature is enabled by default.
...@@ -238,7 +238,7 @@ export SLURM_EAR_LOAD_BALANCE_TH=0.89 ...@@ -238,7 +238,7 @@ export SLURM_EAR_LOAD_BALANCE_TH=0.89
srun [...] my_app srun [...] my_app
``` ```
## Support for Intel(R) Speed Select Technology ### Support for Intel(R) Speed Select Technology
Since version 4.2, EAR supports the interaction with [Intel(R) Speed Select Technology (Intel(R) SST)](https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html) Since version 4.2, EAR supports the interaction with [Intel(R) Speed Select Technology (Intel(R) SST)](https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html)
which lets the user to have more fine grained control over per-CPU Turbo frequency. which lets the user to have more fine grained control over per-CPU Turbo frequency.
...@@ -263,7 +263,7 @@ If you enable [EARL verbosity](User-guide#ear-job-submission-flags) you will see ...@@ -263,7 +263,7 @@ If you enable [EARL verbosity](User-guide#ear-job-submission-flags) you will see
the mapping of the CLOS set for each CPU in the node. the mapping of the CLOS set for each CPU in the node.
Note that a `-1` value means that no change was done on the specific CPU. Note that a `-1` value means that no change was done on the specific CPU.
### EAR_PRIO_TASKS #### EAR_PRIO_TASKS
A list that specifies the CLOS that CPUs assigned to tasks must be set. A list that specifies the CLOS that CPUs assigned to tasks must be set.
This variable is useful because you can configure your application transparently This variable is useful because you can configure your application transparently
...@@ -341,7 +341,7 @@ PRIO3: MAX GHZ - 0.0 GHz (low) ...@@ -341,7 +341,7 @@ PRIO3: MAX GHZ - 0.0 GHz (low)
[120,-1] [121,-1] [122,-1] [123,-1] [124,-1] [125,-1] [126,-1] [127,-1] [120,-1] [121,-1] [122,-1] [123,-1] [124,-1] [125,-1] [126,-1] [127,-1]
``` ```
### EAR_PRIO_CPUS #### EAR_PRIO_CPUS
A list of priorities that should have the same length as the number of CPUs your A list of priorities that should have the same length as the number of CPUs your
job is using. job is using.
...@@ -355,16 +355,16 @@ But it becomes more flexible when the user has more control over the affinity se ...@@ -355,16 +355,16 @@ But it becomes more flexible when the user has more control over the affinity se
to its application, because you can discriminate between different CPUs assigned to the same task. to its application, because you can discriminate between different CPUs assigned to the same task.
Moreover, this is the only way to set different priorities over different threads in no-MPI applications. Moreover, this is the only way to set different priorities over different threads in no-MPI applications.
## Disabling EAR's affinity masks usage ### Disabling EAR's affinity masks usage
For both [Load Balancing](load-balancing) and [Intel(R) SST](#support-for-intel-r-speed-select-technology) For both [Load Balancing](load-balancing) and [Intel(R) SST](#support-for-intel-r-speed-select-technology)
support, EAR uses processes' affinity mask read at the beginning of the job. support, EAR uses processes' affinity mask read at the beginning of the job.
If you are working on an application that changes (or may change) the affinty mask of tasks, this can lead some miss configuration not detected by EAR. If you are working on an application that changes (or may change) the affinty mask of tasks, this can lead some miss configuration not detected by EAR.
To avoid any unexpected problem, **we highly recommend you** to export `EAR_NO_AFFINITY_MASK` environment variable (**even your are not planning to work with some of the mentioned features**). To avoid any unexpected problem, **we highly recommend you** to export `EAR_NO_AFFINITY_MASK` environment variable **even you are not planning to work with some of the mentioned features**.
# Data gathering ## Data gathering
## EAR_GET_MPI_STATS ### EAR_GET_MPI_STATS
Use this variable to generate two files at the end of the job execution that will contain global, per process MPI information. Use this variable to generate two files at the end of the job execution that will contain global, per process MPI information.
You must specify the prefix (optionally with a path) of the filename. One file (*[path/]prefix.ear_mpi_stats.full_nodename.csv*) will contain a resume about MPI throughput (per-process), while the other one (*[path/]prefix.ear_mpi_calls_stats.full_nodename.csv*) will contain a more fine grained information about different MPI call types. You must specify the prefix (optionally with a path) of the filename. One file (*[path/]prefix.ear_mpi_stats.full_nodename.csv*) will contain a resume about MPI throughput (per-process), while the other one (*[path/]prefix.ear_mpi_calls_stats.full_nodename.csv*) will contain a more fine grained information about different MPI call types.
...@@ -441,7 +441,7 @@ Below table shows fields available by **ear_mpi_calls_stats** file: ...@@ -441,7 +441,7 @@ Below table shows fields available by **ear_mpi_calls_stats** file:
| t_SendRecv | Time (in microseconds) spent in **blocking** SendRecv calls. | t_SendRecv | Time (in microseconds) spent in **blocking** SendRecv calls.
| t_Wait | Time (in microseconds) spent in **blocking** Wait calls. | t_Wait | Time (in microseconds) spent in **blocking** Wait calls.
## EAR_TRACE_PLUGIN ### EAR_TRACE_PLUGIN
EAR offers the chance to generate Paraver traces to visualize runtime metrics with the [Paraver tool](https://tools.bsc.es/paraver). EAR offers the chance to generate Paraver traces to visualize runtime metrics with the [Paraver tool](https://tools.bsc.es/paraver).
Paraver is a visualization tool developed by CEPBA-Tools team and currently maintained by the Barcelona Supercomputing Center’s tools team. Paraver is a visualization tool developed by CEPBA-Tools team and currently maintained by the Barcelona Supercomputing Center’s tools team.
...@@ -451,7 +451,7 @@ You must set the value of this variable to `tracer_paraver.so` to load the trace ...@@ -451,7 +451,7 @@ You must set the value of this variable to `tracer_paraver.so` to load the trace
This shared object comes with the official EAR distribution and it is located at `$EAR_INSTALL_PATH/lib/plugins/tracer`. This shared object comes with the official EAR distribution and it is located at `$EAR_INSTALL_PATH/lib/plugins/tracer`.
Then you need to set the `EAR_TRACE_PATH` variable (see below) to specify the destination path of the generated Paraver traces. Then you need to set the `EAR_TRACE_PATH` variable (see below) to specify the destination path of the generated Paraver traces.
## EAR_TRACE_PATH ### EAR_TRACE_PATH
Specify the path where you want to store the trace files generated by the EAR Library. The path must be fully created. Otherwise, the Paraver tracer plug-in won’t be loaded. Specify the path where you want to store the trace files generated by the EAR Library. The path must be fully created. Otherwise, the Paraver tracer plug-in won’t be loaded.
...@@ -469,7 +469,7 @@ srun -n 10 --ear=on ./mpi_app ...@@ -469,7 +469,7 @@ srun -n 10 --ear=on ./mpi_app
``` ```
## REPORT_EARL_EVENTS ### REPORT_EARL_EVENTS
Use this variable (i.e., `export SLURM_REPORT_EARL_EVENTS=1`) to make EARL send internal events to the [Database](EAR-Database). Use this variable (i.e., `export SLURM_REPORT_EARL_EVENTS=1`) to make EARL send internal events to the [Database](EAR-Database).
These events are useful to have more information about Library's behaviour, like These events are useful to have more information about Library's behaviour, like
...@@ -487,7 +487,7 @@ a table of all reported events: ...@@ -487,7 +487,7 @@ a table of all reported events:
| Value | The value stored with the event. Categorical events explained below. | | Value | The value stored with the event. Categorical events explained below. |
| node_id | The node from where the event was reported. | | node_id | The node from where the event was reported. |
### Event types #### Event types
Below are listed all kind of event types you can get when requesting job events. Below are listed all kind of event types you can get when requesting job events.
For categorical event values, the (value, category) mapping is explained. For categorical event values, the (value, category) mapping is explained.
...@@ -513,4 +513,4 @@ For categorical event values, the (value, category) mapping is explained. ...@@ -513,4 +513,4 @@ For categorical event values, the (value, category) mapping is explained.
- **energy_saving** Energy (in %) EAR is guessing the policy is saving. - **energy_saving** Energy (in %) EAR is guessing the policy is saving.
- **power_saving** Power in (in %) EAR is guessing the policy is saving. - **power_saving** Power in (in %) EAR is guessing the policy is saving.
- **performance_penalty** Execution time (in %) EAR is guessing the policy is incrementing. - **performance_penalty** Execution time (in %) EAR is guessing the policy is incrementing.
\ No newline at end of file
Clone repository
  • Home
  • User guide
    • Use cases
      • MPI applications
      • Non-MPI applications
      • Others
    • EAR data
    • Submission flags
    • Examples
    • Job accounting
    • Job energy optimization
  • Commands
    • Job accounting (eacct)
    • System energy report (ereport)
    • EAR control (econtrol)
    • Database management
    • erun
    • ear-info
  • Environment variables
    • Support for Intel(R) speed select technology
  • Admin Guide
    • Architecture/Services
    • Quick installation guide
    • Installation from source
    • Installation from RPM
      • Requirements
    • Updating
    • Configuration
    • Starting services
    • Tools
    • Learning phase
    • Plug-ins
    • Supported systems
    • Powercap
  • Database
    • Database fields
    • Updating the database from previous EAR versions
  • CHANGELOG
  • FAQs
  • Known issues