... | ... | @@ -23,13 +23,13 @@ EAR SLURM plug-in can be enabled by adding an additional line at the `/etc/slurm |
|
|
|
|
|
Another way to enable it is to create the directory `/etc/slurm/plugstack.conf.d` and copy there the `ear_etc_path/slurm/ear.plugstack.conf` file. On that case, the content of `/etc/slurm/plugstack.conf` must be `include /etc/slurm/plugstack.conf.d/\\\*`.
|
|
|
|
|
|
# EAR configuration file
|
|
|
## EAR configuration file
|
|
|
|
|
|
The **ear.conf** is a text file describing the EAR package behaviour in the cluster. It must be readable by all compute nodes and by nodes where commands are executed. Two `ear.conf` templates are generated with default values and will be installed as reference when executing `make etc.install`.
|
|
|
|
|
|
Usually the first word in the configuration file expresses the component related with the option. Lines starting with `#` are comments. A test for `ear.conf` file can be found in the path `src/test/functionals/ear_conf`. It is recommended to test it since the `ear.conf` parser is very sensible to errors in the `ear.conf` syntax, spaces, newlines, etc.
|
|
|
|
|
|
## Database configuration
|
|
|
### Database configuration
|
|
|
|
|
|
```
|
|
|
# The IP of the node where the MariaDB (MySQL) or PostgreSQL server process is running. Current version uses same names for both DB servers.
|
... | ... | @@ -61,7 +61,7 @@ DBReportSigDetail=1 |
|
|
DBReportLoops=1
|
|
|
```
|
|
|
|
|
|
## EARD configuration
|
|
|
### EARD configuration
|
|
|
|
|
|
```
|
|
|
# The port where the EARD will be listening.
|
... | ... | @@ -91,7 +91,7 @@ NodeUseLog=1 |
|
|
EARDReportPlugins=eardbd.so
|
|
|
```
|
|
|
|
|
|
## EARDBD configuration
|
|
|
### EARDBD configuration
|
|
|
|
|
|
```
|
|
|
# Port where the EARDBD server is listening.
|
... | ... | @@ -118,7 +118,7 @@ DBDaemonUseLog=1 |
|
|
EARDBDReportPlugins=mysql.so
|
|
|
```
|
|
|
|
|
|
## EARL configuration
|
|
|
### EARL configuration
|
|
|
|
|
|
```
|
|
|
# Path where coefficients are installed, usually $EAR_ETC/ear/coeffs.
|
... | ... | @@ -140,38 +140,35 @@ CheckEARModeEvery=1000 |
|
|
EARLReportPlug-ins=eard.so
|
|
|
```
|
|
|
|
|
|
## EARGM configuration
|
|
|
### EARGM configuration
|
|
|
|
|
|
You can skip this section if EARGM is not used in your installation.
|
|
|
|
|
|
```
|
|
|
# Use aggregated periodic metrics or periodic power metrics.
|
|
|
# Aggregated metrics are only available when EARDBD is running.
|
|
|
EARGMUseAggregated=1
|
|
|
# Period T1 and T2 are specified in seconds. T1 must be less than T2, e.g., 10min and 1 month.
|
|
|
EARGMPeriodT1=90
|
|
|
EARGMPeriodT2=259200
|
|
|
# Verbosity
|
|
|
EARGMVerbose=1
|
|
|
# When set to 1, the output is saved in 'TmpDir'/eargmd.log (common configuration) as a log file.
|
|
|
EARGMUseLog=1
|
|
|
EARGMPort=50000
|
|
|
# Email address to report the warning level (and the action taken in automatic mode).
|
|
|
EARGMMail=nomail
|
|
|
# Period T1 and T2 are specified in seconds (ex. T1 must be less than T2, ex. 10min and 1 month).
|
|
|
EARGMEnergyPeriodT1=90
|
|
|
EARGMEnergyPeriodT2=259200
|
|
|
# '-' are Joules, 'K' KiloJoules and 'M' MegaJoules.
|
|
|
EARGMUnits=K
|
|
|
|
|
|
EARGMEnergyUnits=K
|
|
|
# Energy limit applies to EARGMPeriodT2.
|
|
|
EARGMEnergyLimit=550000
|
|
|
EARGMPort=50000
|
|
|
|
|
|
# Use aggregated periodic metrics or periodic power metrics.
|
|
|
# Aggregated metrics are only available when EARDBD is running.
|
|
|
EARGMEnergyUseAggregated=1
|
|
|
# Two modes are supported '0=manual' and '1=automatic'.
|
|
|
# manual means no actions are token, only monitoring.
|
|
|
EARGMMode=0
|
|
|
# Email address to report the warning level (and the action taken in automatic mode).
|
|
|
EARGMMail=nomail
|
|
|
EARGMEnergyMode=0
|
|
|
# Percentage of accumulated energy to start the warning DEFCON level L4, L3 and L2.
|
|
|
EARGMWarningsPerc=85,90,95
|
|
|
EARGMEnergyWarningsPerc=85,90,95
|
|
|
# T1 "grace" periods between DEFCON before re-evaluate.
|
|
|
EARGMGracePeriods=3
|
|
|
# Verbosity
|
|
|
EARGMVerbose=1
|
|
|
# When set to 1, the output is saved at 'TmpDir'/eargmd.log (common configuration) as a log file.
|
|
|
EARGMUseLog=1
|
|
|
# Format for action is: "command_name energy_T1 energy_T2 energy_limit T2 T1 units"
|
|
|
EARGMEnergyGracePeriods=3
|
|
|
# Format for action is: command_name energy_T1 energy_T2 energy_limit T2 T1 units "
|
|
|
# This action is automatically executed at each warning level (only once per grace periods).
|
|
|
EARGMEnergyAction=no_action
|
|
|
|
... | ... | @@ -208,7 +205,7 @@ EARGMId=2 energy=0 power=500 node=node1 port=50101 |
|
|
EARGMId=3 energy=0 power=500 node=node2 port=50100
|
|
|
```
|
|
|
|
|
|
## Common configuration
|
|
|
### Common configuration
|
|
|
|
|
|
```
|
|
|
# Default verbose level
|
... | ... | @@ -225,7 +222,7 @@ InstDir=/path/to/inst |
|
|
#NetworkExtension=
|
|
|
```
|
|
|
|
|
|
## EAR Authorized users/groups/accounts
|
|
|
### EAR Authorized users/groups/accounts
|
|
|
|
|
|
Authorized users that are allowed to change policies, thresholds and frequencies are supposed to be administrators. A list of users, Linux groups, and/or SLURM accounts can be provided to allow normal users to perform that actions. Only normal Authorized users can execute the learning phase.
|
|
|
|
... | ... | @@ -235,7 +232,7 @@ AuthorizedAccounts=acc1,acc2,acc3 |
|
|
AuthorizedGroups=xx,yy
|
|
|
```
|
|
|
|
|
|
## Energy tags
|
|
|
### Energy tags
|
|
|
|
|
|
Energy tags are pre-defined configurations for some applications (EAR Library is not loaded). This energy tags accept a user ids, groups and SLURM accounts of users allowed to use that tag.
|
|
|
|
... | ... | @@ -246,7 +243,7 @@ EnergyTag=cpu-intensive pstate=1 |
|
|
EnergyTag=memory-intensive pstate=4 users=user1,user2 groups=group1,group2 accounts=acc1,acc2
|
|
|
```
|
|
|
|
|
|
## Tags
|
|
|
### Tags
|
|
|
|
|
|
Tags are used for architectural descriptions. Max. AVX frequencies are used in predictor models and are SKU-specific. At least a default tag is mandatory to be included for a cluster to properly work.
|
|
|
|
... | ... | @@ -281,7 +278,7 @@ Tag=6148 default=yes max_avx512=2.2 max_avx2=2.6 max_power=500 powercap=1 max_po |
|
|
Tag=6126 max_avx512=2.3 max_avx2=2.9 ceffs=coeffs.6126.default max_power=600 error_power=700 idle_governor=ondemand
|
|
|
```
|
|
|
|
|
|
## Power policies plug-ins
|
|
|
### Power policies plug-ins
|
|
|
|
|
|
```
|
|
|
# Policy names must be exactly file names for policies installeled in the system.
|
... | ... | @@ -303,7 +300,7 @@ Policy=min_energy Settings=0.05 DefaultFreq=2.4 Privileged=1 |
|
|
#Policy=monitoring Settings=0 DefaultFreq=2.6 Privileged=0 tag=6126
|
|
|
```
|
|
|
|
|
|
## Island description
|
|
|
### Island description
|
|
|
|
|
|
This section is mandatory since it is used for cluster description. Normally nodes are grouped in islands that share the same hardware characteristics as well as its database managers (EARDBDS). Each entry describes part of an island, and every node must be in an island.
|
|
|
|
... | ... | @@ -351,7 +348,7 @@ Detailed island accepted values: |
|
|
- Island=1 Nodes=`node\\\[1,2\\\],node3`
|
|
|
- Island=1 Nodes=`node\\\[1-3\\\],node4`
|
|
|
|
|
|
# SLURM SPANK plug-in configuration file
|
|
|
## SLURM SPANK plug-in configuration file
|
|
|
|
|
|
SLURM loads the plug-in through a file called `plugstack.conf`, which is composed by a list of a plug-ins. In the file `etc/slurm/ear.plugstack.conf`, there is an example entry with the paths already set to the plug-in, temporal and configuration paths.
|
|
|
|
... | ... | @@ -371,7 +368,7 @@ Also, there are two additional arguments. The first one, `nodes_allowed=` follow |
|
|
required ear_install_path/lib/earplug.so prefix=ear_install_path sysconfdir=etc_ear_path localstatedir=tmp_ear_path earlib_default=off nodes_excluded=node01,node02
|
|
|
```
|
|
|
|
|
|
# MySQL/PostgreSQL
|
|
|
## MySQL/PostgreSQL
|
|
|
|
|
|
**WARNING**: If any EAR component is running in the same machine as the MySQL server some connection problems might occur. This will not happen with PostgreSQL. To solve those issues, input into MySQL's CLI client the `CREATE USER` and `GRANT PRIVILEGES` queries from `edb_create -o` changing the portion `'user_name'@'%'` to `'user_name'@'localhost'` so that EAR's users have access to the server from the local machine. There are two ways to configure a database server for EAR's usage.
|
|
|
|
... | ... | @@ -380,7 +377,7 @@ required ear_install_path/lib/earplug.so prefix=ear_install_path sysconfdir=etc |
|
|
|
|
|
For more information about how each `ear.conf` flag changes the database creation, see our [Database section](EAR-Database). For further information about EAR's database management tools, see the [Commands section](EAR-commands#database-commands).
|
|
|
|
|
|
# MSR Safe
|
|
|
## MSR Safe
|
|
|
|
|
|
MSR Safe is a kernel module that allows to read and write MSR without root permission. EAR opens MSR Safe files if the ordinary MSR files fail. MSR Safe requires a configuration file to allow read and write registers. You can find configuration files in `etc/msr_safe` for Intel Skylake and superior and AMD Zen and superior.
|
|
|
|
... | ... | @@ -392,6 +389,6 @@ cat intel63 > /dev/cpu/msr_allowlist |
|
|
|
|
|
You can find more information in the [official repository](https://github.com/LLNL/msr-safe)
|
|
|
|
|
|
# Next step
|
|
|
## Next step
|
|
|
|
|
|
Visit the [execution page](Starting%20services) to run EAR's different components. |
|
|
\ No newline at end of file |
|
|
Visit the [execution page](Starting%20services) to run EAR's different components. |