|
|
Configuration requirements
|
|
|
-------------
|
|
|
The following requirements must be met for EAR to work properly:
|
|
|
- EAR folders: EAR uses two paths for EAR configuration.
|
|
|
- EAR_TMP=tmp_ear_path must be a private folder per compute node. It must have read/write permissions for normal users. Communication files are created here. `tmp_ear_path` must be created by the admin.
|
|
|
For instance: `mkdir /var/ear; chmod ugo +rwx /var/ear`
|
|
|
- EAR_ETC=etc_ear_path must be readable for normal users in all compute nodes. It can be a shared folder in “GPFS” (simple to manage) or replicated data because it is very few data and modified at a very low frequency (`ear.conf` and coefficients). Coefficients can be installed in a different path specified at configure time in COEFFS flag. Both `ear.conf` and coefficients must be readable in all the nodes (compute and “service” nodes).
|
|
|
- Configure `ear.conf`: `ear.conf` is an ascii file setting default values and cluster descriptions. An `ear.conf` is automatically generated based on a `ear.conf.in` template. However, sysadmin must include installation details such as hostname details for EAR services, ports, default values, and list of nodes. For more details, check [EAR configuration file](#ear-configuration-file) below.
|
|
|
- MySQL DB or PostgreSQL DB: EAR saves data in a MySQL/PostgreSQL DB server. EAR DB can be created using `edb_create` command provided (MySQL/PostgreSQL server must be running and root access to the DB is needed)
|
|
|
- Set EAR SLURM plugin
|
|
|
- EAR SLURM plugin must be set in /etc/slurm/plugstack.conf. EAR generates an example at ear_etc_path/slurm/ear.plugstack.conf. For more information see our [Plugin section](Configuration#slurm-spank-plugin-configuration-file) down below.
|
|
|
|
|
|
EAR configuration file
|
|
|
----------------------
|
|
|
`ear.conf` is a text file describing the EAR package behaviour in the cluster. It must be readable by all compute nodes and by nodes where commands are executed.
|
|
|
|
|
|
Usually the first word in the configuration file expresses the component related with the option. Lines starting with `#` are comments.
|
|
|
|
|
|
A test for ear.conf file can be found in the path `src/test/functionals/ear_conf`.
|
|
|
|
|
|
###### In-depth EAR configuration file options
|
|
|
|
|
|
### Database configuration
|
|
|
|
|
|
```INI
|
|
|
# The IP of the node where the MariaDB (MySQL) or PostgreSQL server process is running. Current version uses same names for both DB servers
|
|
|
DBIp=172.30.2.101
|
|
|
# Port in which the server accepts the connections.
|
|
|
DBPort=3306
|
|
|
# MariaDB user that the services will use. Needs INSERT/SELECT privileges. Used by EARDBD
|
|
|
DBUser=eardbd_user
|
|
|
# Password for the previous user. If left blank or commented it will assume the user has no password.
|
|
|
DBPassw=eardbd_pass
|
|
|
# Database user that the commands (eacct, ereport) will use. Only uses SELECT privileges.
|
|
|
DBCommandsUser=ear_commands
|
|
|
# Password for the previous user. If left blank or commented it will assume the user has no password.
|
|
|
DBCommandsPassw=commandspass
|
|
|
# Name of EAR's database in the server.
|
|
|
DBDatabase=EAR
|
|
|
# Maximum number of connections of the commands user to prevent server saturation/malicious actuation. Applies to DBCommandsUser
|
|
|
DBMaxConnections=20
|
|
|
# The following specify the granularity of data reported to database.
|
|
|
# Extended node information reported to database (added: temperature, avg_freq, DRAM and PCK energy in power monitoring).
|
|
|
DBReportNodeDetail=1
|
|
|
# Extended signature hardware counters reported to database.
|
|
|
DBReportSigDetail=1
|
|
|
# Set to 1 if you want Loop signatures to be reported to database.
|
|
|
DBReportLoops=0
|
|
|
```
|
|
|
|
|
|
### EARD configuration. EARD are executed in compute nodes
|
|
|
|
|
|
```INI
|
|
|
# The port where the EARD will be listening.
|
|
|
NodeDaemonPort=50001
|
|
|
# Frequency used by power monitoring service, in seconds.
|
|
|
NodeDaemonPowermonFreq=60
|
|
|
# Maximum supported frequency (1 means nominal, no turbo).
|
|
|
NodeDaemonMaxPstate=1
|
|
|
# Enable (1) or disable (0) the turbo frequency.
|
|
|
NodeDaemonTurbo=0
|
|
|
# Enables the use of the database.
|
|
|
NodeUseDB=1
|
|
|
# Inserts data to MySQL by sending that data to the EARDBD (1) or directly (0).
|
|
|
NodeUseEARDBD=1
|
|
|
# '1' means EAR is controlling frequencies at all times (targeted to production systems) and 0 means EAR will not change the frequencies when users are not using EAR library (targeted to benchmarking systems).
|
|
|
NodeDaemonForceFrequencies=1
|
|
|
# The verbosity level [0..4]
|
|
|
NodeDaemonVerbose=1
|
|
|
# When set to 1, the output is saved in '$EAR_TMP'/eard.log (common configuration) as a log file.Otherwsie, stderr is used.
|
|
|
NodeUseLog=1
|
|
|
# Minimum time between two energy readings for performance accuracy
|
|
|
MinTimePerformanceAccuracy=10000000
|
|
|
```
|
|
|
|
|
|
### EARDBD configuration
|
|
|
|
|
|
```INI
|
|
|
# Port where the EARDBD server is listening
|
|
|
DBDaemonPortTCP=50002
|
|
|
# Port where the EARDBD mirror is listening
|
|
|
DBDaemonPortSecTCP=50003
|
|
|
# Port is used to synchronize the server and mirror
|
|
|
DBDaemonSyncPort=50004
|
|
|
# In seconds, interval of time of accumulating data to generate an energy aggregation
|
|
|
DBDaemonAggregationTime=60
|
|
|
# In seconds, time between inserts of the buffered data
|
|
|
DBDaemonInsertionTime=30
|
|
|
# Memory allocated per process. This allocations is used for buffering the data sent to the database by EARD or other components. If there is a server and mirror in a node a double of that value will be allocated. It is expressed in MegaBytes.
|
|
|
DBDaemonMemorySize=120
|
|
|
# The percentage of the memory buffer used by the previous field, by each type. These types are: mpi, non-mpi and learning applications, loops, energy metrics and aggregations and events, in that order. If a type gets 0% of space, this metric is discarded and not saved into the database.
|
|
|
DBDaemonMemorySizePerType=40,20,5,24,5,1,5
|
|
|
# When set to 1, eardbd uses a '$EAR_TMP'/eardbd.log file as a log file
|
|
|
DBDaemonUseLog=1
|
|
|
```
|
|
|
|
|
|
### EARL configuration
|
|
|
|
|
|
```INI
|
|
|
# Path where coefficients are installed, usually $EAR_ETC/ear/coeffs
|
|
|
CoefficientsDir=/path/to/coeffs
|
|
|
# Number of levels used by DynAIS algorithm.
|
|
|
DynAISLevels=10
|
|
|
# Windows size used by DynAIS, the higher the size the higher the overhead.
|
|
|
DynAISWindowSize=200
|
|
|
# Maximum time in seconds that EAR will wait until a signature is computed. After this value, if no signature is computed, EAR will go to periodic mode.
|
|
|
DynaisTimeout=15
|
|
|
# Time in seconds to compute every application signature when the EAR goes to periodic mode.
|
|
|
LibraryPeriod=10
|
|
|
# Number of MPI calls whether EAR must go to periodic mode or not.
|
|
|
CheckEARModeEvery=1000
|
|
|
```
|
|
|
|
|
|
### EARGM configuration
|
|
|
|
|
|
```INI
|
|
|
# The IP or hostname of the node where the EARGMD demon is running.
|
|
|
EARGMHost=hostname
|
|
|
# Port where EARGMD will be listening.
|
|
|
EARGMPort=50000
|
|
|
# Use '1' or not '0' aggregated metrics to compute total energy.
|
|
|
EARGMUseAggregated=1
|
|
|
# Period T1 and period T2 are specified in seconds. T1 must be less than T2. Global manager updates the information every T1 seconds and uses the energy/power in T2 period to estimate energy/power constraints
|
|
|
EARGMPeriodT1=90
|
|
|
EARGMPeriodT2=259200
|
|
|
# Units field, Can be '-' (Joules), 'K' KiloJoules or 'M' MegaJoules
|
|
|
EARGMUnits=K
|
|
|
# This limit means the maximum energy allowed in 259200 seconds in 550000 KJoules
|
|
|
EARGMEnergyLimit=550000
|
|
|
#
|
|
|
# Global manager modes. Two modes are supported '0' (manual) or '1' (automatic). Manual means Gobal Manager is only monitoring energy&power and reporting to the DB . Automatic means it takes actions to guarantee energy limits.
|
|
|
EARGMMode=0
|
|
|
# A mail can be sent reporting the warning level (and the action taken in automatic mode). 'nomail' means no mail is sent. This option is independent of the node.
|
|
|
EARGMMail=nomail
|
|
|
# Percentage of accumulated energy to start the warning DEFCON level L4, L3 and L2.
|
|
|
EARGMWarningsPerc=85,90,95
|
|
|
# Number of "grace" T1 periods before doing a new re-evaluation. After a warning, EARGM will wait T1xGlobalManagerGracePeriods seconds until it raises a new warning.
|
|
|
EARGMGracePeriods=6
|
|
|
# Verbose level
|
|
|
EARGMVerbose=1
|
|
|
# When set to 1, the output is saved in '$EAR_TMP'/eargmd.log (common configuration) as a log file.
|
|
|
EARGMUseLog=1
|
|
|
# Format for action is: command_name energy_T1 energy_T2 energy_limit T2 T1 units "
|
|
|
# This action is automatically executed at each warning level (only once per grace periods)
|
|
|
EARGMEnergyAction=no_action
|
|
|
```
|
|
|
|
|
|
### Common configuration
|
|
|
|
|
|
```INI
|
|
|
# Network extension (using another network instead of the local one). If compute nodes must be accessed from login nodes with a network different than default, and can be accesed using a expension, uncommmet next line and define 'netext' accordingly.
|
|
|
# NetworkExtension=netext
|
|
|
# Default verbose level
|
|
|
Verbose=0
|
|
|
# Path used for communication files, shared memory, etc. It must be PRIVATE per compute node and with read/write permissions. $EAR_TMP
|
|
|
TmpDir=/tmp/ear
|
|
|
# Path where coefficients and configuration are stored. It must be readable in all compute nodes. $EAR_ETC
|
|
|
EtcDir=/path/to/etc
|
|
|
InstDir=/path/to/inst
|
|
|
# Path where metrics are generated in text files when no database is installed. A suffix is included.
|
|
|
DataBasePathName=/etc/ear/dbs/dbs.
|
|
|
# Energy reading plugin (without the extension). Allows to use different system components to read the energy of the node. In this case, this plugin reads the energy of the system using Intel Node Manager.
|
|
|
# look at /path/to/inst/lib/plugins/energy folder to see the list of installed energy plugins
|
|
|
Energy_plugin=energy_nm.so
|
|
|
# Power model plugin (without the extension). The power model plugin is used to predict the power and energy consumption of the next iteration of the executing application.
|
|
|
Energy_model=avx512_model.so
|
|
|
```
|
|
|
|
|
|
### EAR-Authorized users/groups/accounts
|
|
|
|
|
|
Authorized users that are allowed to change policies, thresholds and frequencies are supposed to be administrators. A list of users, Linux groups, and/or SLURM accounts can be provided to allow normal users to perform that actions. Only normal Authorized users can execute the learning phase.
|
|
|
|
|
|
```INI
|
|
|
AuthorizedUsers=user1,user2
|
|
|
AuthorizedAccounts=acc1,acc2,acc3
|
|
|
AuthorizedGroups=xx,yy
|
|
|
```
|
|
|
### Energy tags
|
|
|
|
|
|
Energy tags are pre-defined configurations for some applications (EAR library is not loaded). This energy tags accept a user ids, groups and SLURM accounts of users allowed to use that tag.
|
|
|
|
|
|
```INI
|
|
|
# General energy tag
|
|
|
EnergyTag=cpu-intensive pstate=1
|
|
|
# Energy tag with limited users
|
|
|
EnergyTag=memory-intensive pstate=4 users=user1,user2 groups=group1,group2 accounts=acc1,acc2
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Tags
|
|
|
Tags are used for architectural descriptions. Max. AVX frequencies are used in predictor models and are SKU-specific. At least a default tag is mandatory to be included for a cluster to work properly. At least a default tag is mandatory.
|
|
|
|
|
|
The `min_power`, `max_power` and `error_power` are threshold values that determine if the metrics read might be invalid, and a warning message to syslog will be reported if the values are outside of said thresholds. `error_power` is a more extreme value that if a metric surpasses it, said metric will not be reported to database.
|
|
|
|
|
|
A special energy plugin or energy model can be specified in a tag that will override the global values previously defined in all nodes that have this tag associated with them.
|
|
|
|
|
|
```INI
|
|
|
Tag=6148 default=yes max_avx512=2.2 max_avx2=2.6 max_power=500 min_power=50 error_power=600 coeffs=coeffs.default
|
|
|
Tag=6126 max_avx512=2.3 max_avx2=2.9 ceffs=coeffs.6126.default max_power=600 error_power=700
|
|
|
```
|
|
|
|
|
|
### Power policies plugins
|
|
|
|
|
|
```INI
|
|
|
#---------------------------------------------------------------------------------------------------
|
|
|
## Power policies
|
|
|
## ---------------------------------------------------------------------------------------------------
|
|
|
#
|
|
|
## policy names must be exactly file names for policies installeled in the system
|
|
|
DefaultPowerPolicy=min_time
|
|
|
Policy=monitoring Settings=0 DefaultFreq=2.4 Privileged=0
|
|
|
Policy=min_time Settings=0.7 DefaultFreq=2.0 Privileged=0
|
|
|
Policy=min_energy Settings=0.1 DefaultFreq=2.4 Privileged=1
|
|
|
|
|
|
# For homogeneous systems, default frequencies can be easily specified using freqs, for heterogeneous systems it is preferred to use pstates
|
|
|
|
|
|
# Example with pstates (lower pstates corresponds with higher frequencies). Pstate=1 is nominal and 0 is turbo
|
|
|
#Policy=monitoring Settings=0 DefaultPstate=1 Privileged=0
|
|
|
#Policy=min_time Settings=0.7 DefaultPstate=4 Privileged=0
|
|
|
#Policy=min_energy Settings=0.1 DefaultPstate=1 Privileged=1
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
### Island description
|
|
|
|
|
|
This section is mandatory since it is used for cluster description. Normally nodes are grouped in islands that share the same hardware characteristics as well as its database managers (EARDBDS). Each line describes an island, and every node must be in an island.
|
|
|
|
|
|
Remember that there are two kinds of database daemons. One called 'server' and other one called 'mirror'. Both performs the metrics buffering process, but just one performs the insert. The mirror will do that insert in case the 'server' process crashes or the node fails.
|
|
|
|
|
|
It is recommended for all islands to have symmetry. For example, if the island I0 and I1 have the server N0 and the mirror N1, the next island would have to point the same N0 and N1 or point to new ones N2 and N3.
|
|
|
|
|
|
Multiple EARDBDs are supported in the same island, so more than one line per island is required, but the condition of symmetry have to be met.
|
|
|
|
|
|
It is recommended that for a island to the server and the mirror running in different nodes. However, the EARDBD program could be both server and mirror at the same time. This means that the islands I0 and I1 could have the N0 server and the N2 mirror, and the islands I2 and I3 the N2 server and N0 mirror, fulfilling the symmetry requirements.
|
|
|
|
|
|
A tag can be specified that will apply to all the nodes in that line. If no tag is defined, the default one will be used as hardware definition
|
|
|
|
|
|
|
|
|
|
|
|
```INI
|
|
|
Island=0 Nodes=nodename_list DBIP=EARDB_server_hostname DBSECIP=EARDB_mirror_hostname
|
|
|
|
|
|
#This second island uses a tag that is not the default one
|
|
|
Island=1 Nodes=nodename_list DBIP=EARDB_server_hostname DBSECIP=EARDB_mirror_hostname Tag=6126
|
|
|
```
|
|
|
|
|
|
Detailed island accepted values:
|
|
|
- nodename_list accepts the following formats:
|
|
|
- Nodes=`node1,node2,node3`
|
|
|
- Nodes=`node[1-3]`
|
|
|
- Nodes=`node[1,2,3]`
|
|
|
- Any combination of the two latter options will work, but if nodes have to be specified individually (the first format) as of now they have to be specified in their own line. As an example:
|
|
|
- Valid formats:
|
|
|
- Island=1 Nodes=`node1,node2,node3`
|
|
|
- Island=1 Nodes=`node[1-3],node[4,5]`
|
|
|
- Invalid formats:
|
|
|
- Island=1 Nodes=`node[1,2],node3`
|
|
|
- Island=1 Nodes=`node[1-3],node4`
|
|
|
|
|
|
|
|
|
SLURM spank plugin configuration file
|
|
|
------------------------
|
|
|
SLURM loads the plugin through a file called `plugstack.conf`, which is composed by a list of a plugins. In the file `etc/slurm/ear.plugstack.conf`, there is an example entry with the paths already set to the plugin, temporal and configuration paths.
|
|
|
|
|
|
__Example__:
|
|
|
```
|
|
|
required ear_install_path/lib/earplug.so prefix=ear_install_path sysconfdir=etc_ear_path localstatedir=tmp_ear_path earlib_default=off
|
|
|
```
|
|
|
|
|
|
The argument `prefix` points to the EAR installation path and it is used to load the library using `LD_PRELOAD` mechanism. Also the `localstatedir` is used to contact with the EARD, which by default points the path you set during the `./configure` using `--localstatedir` or `EAR_TMP` arguments. Next to these fields, there is the field `earlib_default=off`, which means that by default EARL is not loaded, and `eargmd_host` and `eargmd_port`, if you plan to connect with the EARGMD component (you can leave this empty).
|
|
|
|
|
|
MySQL/PostgreSQL
|
|
|
-----
|
|
|
**WARNING**: If any EAR component is running in the same machine as the MySQL server some connection problems might occur. This will not happen with PostgreSQL. To solve those issues, input into MySQL's CLI client the `CREATE USER` and `GRANT PRIVILEGES` queries from `edb_create -o` changing the portion `'user_name'@'%'` to `'user_name'@'localhost'` so that EAR's users have access to the server from the local machine.
|
|
|
There are two ways to configure a database server for EAR's usage.
|
|
|
- run `edb_create -r` located in `$EAR_INSTALLATION_PATH/sbin` from a node with root access to the MySQL server. This requires MySQL/PostgreSQL's section of ear.conf to be correctly written. For more info run `edb_create -h`.
|
|
|
- Manually create the database and users specified in ear.conf, as well as the required tables. If ear.conf has been configured, running `edb_create -o` will output the queries that would be run with the program that contain all that is needed for EAR to properly function.
|
|
|
|
|
|
For more information about how each `ear.conf` flag changes the database creation, see our [Database section](EAR-Database).
|
|
|
|
|
|
|
|
|
Next step
|
|
|
---------
|
|
|
Visit the [execution page](Execution) to run EAR's different components. |