... | ... | @@ -19,32 +19,30 @@ This section provides a, summed up, step by step installation and execution guid |
|
|
|
|
|
## Requirements
|
|
|
|
|
|
- To install EAR from sources, the following libraries and environments are needed: C compiler,papi, gsl, MPI, mysqlclient for mariaDB.
|
|
|
- To install EAR from rpm (only binaries) all these dependencies have been removed except mysqlclient. However, they are neeed when running EAR.
|
|
|
- SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR library using the SLURM plugin, it must be running when EAR library wants to be used (not needed for node monitoring)
|
|
|
- The drivers for CPUFreq management (acpi-cpufreq) and Open IPMI must be present and loaded.
|
|
|
- msr kernel driver must be loaded
|
|
|
- MySQL server must be up and running.
|
|
|
- Hardware counters must be accessible for normal users. Set /proc/sys/kernel/perf_event_paranoid to 2 (or less). (sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid")
|
|
|
- To install EAR from sources, the following libraries and environments are needed: C compiler, MPI compiler and library if MPI version is generated, *mysqlclient* for mariaDB or *postgresql* library. *libGSL* is needed for coefficient computations
|
|
|
- To install EAR from **rpm** (only binaries) all these dependencies have been removed except *mysqlclient*. However, they are neeed when running EAR.
|
|
|
- SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR library using the SLURM plugin, it must be running when EAR library wants to be used (not needed for node monitoring).
|
|
|
- The drivers for CPUFreq management (*acpi-cpufreq*) and Open IPMI must be present and loaded in compute nodes.
|
|
|
- *msr kernel* module must be loaded in compute nodes.
|
|
|
- mariaDB or postgress server must be up and running.
|
|
|
- Hardware counters must be accessible for normal users. Set */proc/sys/kernel/perf\_event\_paranoid* to 2 (or less). Type `sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"` in compute nodes.
|
|
|
|
|
|
## Installation, configuration and execution
|
|
|
1. Compile and install from source code or install via .rpm. EAR_TMP and EAR_ETC are defined in ear module. Till the module is not loaded, define manually these env vars to execute the next steps.
|
|
|
2. Create the $EAR_TMP folder. This folder must be local to each node, so we recommend to create it in /var/ear.
|
|
|
3. Either installing from sources or rpm, EAR installs a template for ear.conf file in `$EAR_ETC/ear/ear.conf.template`. Copy at `$EAR_ETC/ear/ear.conf` and update with the desired configuration. Go to our [ear.conf](Configuration#ear-configuration-file) page to see how to do it.The ear.conf is used by all the services.
|
|
|
4. Load EAR module to enable commands. It can be found in `$EAR_ETC/module`. You can add ear module when it's not in standard paths by doing `module use $EAR_ETC/module` and then `module load ear`.
|
|
|
5. Create EAR database with `edb_create`. The `edb_create -p` command will ask you for the DB root password. If you get any problem here, check first the node where you are running the command can connect to the DB server. In case problems persists, execute `edb_create -o` to report the specific SQL queries generated. In case of troubles, contact with [ear-support@bsc.es](mailto:ear-support@bsc.es).
|
|
|
6. EAR uses a power and performance model based on systems signatures. These system signatures are stored in coefficient files. Before starting EARDxa, and just for testings, it is needed to create a dummy coefficient file and copy in the coefficients path (by default placed at $EAR_ETC/coeffs). Visit the [tools section](Tools), coeffs_null application.
|
|
|
1. Compile and install from source code or install via rpm. `$EAR_TMP` and `$EAR_ETC` are defined in ear module. Till the module is not loaded, define manually these environment variables to execute the next steps.
|
|
|
2. Create the `$EAR_TMP` folder. This folder must be local to each node, so we recommend to create it in /var/ear.
|
|
|
3. Either installing from sources or rpm, EAR installs a template for **ear.conf** file in `$EAR_ETC/ear/ear.conf.template`. Copy at `$EAR_ETC/ear/ear.conf` and update with the desired configuration. Go to [ear.conf](Configuration#ear-configuration-file) page to see how to do it. The ear.conf is used by all the services.
|
|
|
4. Load EAR module to enable commands. It can be found in `$EAR_ETC/module`. You can add ear module when it's not in standard paths by doing `module use $EAR_ETC/module` and then `module load ear`.
|
|
|
5. Create EAR database with `edb_create`. The `edb_create -p` command will ask you for the DB root password. If you get any problem here, check first whether the node where you are running the command can connect to the DB server. In case problems persists, execute `edb_create -o` to report the specific SQL queries generated. In case of trouble, contact with [ear-support@bsc.es](mailto:ear-support@bsc.es).
|
|
|
6. EAR uses a power and performance model based on systems signatures. These system signatures are stored in coefficient files. Before starting EARD, and just for testing, it is needed to create a dummy coefficient file and copy in the coefficients path (by default placed at `$EAR_ETC/coeffs`). Visit the *coeffs\_null* application from [tools section](Tools).
|
|
|
7. Copy EAR service files to start/stop services using system commands such as systemctl. EAR service files are generated at `$EAR_ETC/systemd` and they can usually be placed in `$(ETC)/systemd`.
|
|
|
8. Start EARDs and EARDBDs via services (see our [Launching the components with unit services](Execution#launching-the-components-through-unit-services)). EARDBD and EARD outputs can be found at ´$EAR_TMP/eardbd.log´ and ´$EAR_TMP/eard.log´ respectivelly when DBDaemonUseLog and NodeUseLog options are set to 1 in ear.conf file. Otherwise, their outputs are generated in stderr and can be seen using the journactl command. For instance, use ´journactl -u eard´ to look at eard output.
|
|
|
9. Check that the EARDs are up and running correctly with `econtrol --status` (note that the daemons will take around a minute to correctly report energy and not show up as an error in `econtrol`). EARDs creates a per-node text file with values reported to the EARDBD. In case there is problems when running econtrol, you can also find this file at `$EAR_TMP/nodename.pm_periodic_data.txt`.
|
|
|
10. Check that the EARDs are reporting metrics to database with `ereport` (`ereport -n all` should report the total energy send by each daemon since the setup).
|
|
|
8. Start EARDs and EARDBDs via services (see our [Launching the components with unit services](Execution#launching-the-components-through-unit-services)). EARDBD and EARD outputs can be found at `$EAR_TMP/eardbd.log` and `$EAR_TMP/eard.log` respectively when *DBDaemonUseLog* and *NodeUseLog* options are set to 1 in ear.conf file. Otherwise, their outputs are generated in *stderr* and can be seen using the *journactl* command. For instance, use `journactl -u eard` to look at eard output.
|
|
|
9. Check that EARDs are up and running correctly with `econtrol --status` (note that daemons will take around a minute to correctly report energy and not show up as an error in `econtrol`). EARDs create a per-node text file with values reported to the EARDBD. In case there are problems when running *econtrol*, you can also find this file at `$EAR_TMP/nodename.pm_periodic_data.txt`.
|
|
|
10. Check that EARDs are reporting metrics to database with *ereport*. `ereport -n all` should report the total energy send by each daemon since the setup.
|
|
|
11. Start EARGM via services.
|
|
|
12. Check if EARGM is reporting to database with `ereport -g`. (Note that EARGM will take a period of time set by the admin in `ear.conf`, option GlobalManagerPeriodT1, to report for the first time. ).
|
|
|
13. Set up EAR's SLURM plugin (see our [Configuration](Configuration) page for more information).
|
|
|
14. Run an application via SLURM and check that it is correctly reported to database with `eacct`. (Note that only privileged users can check other users' applications).
|
|
|
15. Run an MPI application with `--ear=on` and check that the report by `eacct` now includes the library metrics. EAR library depends on the MPI version: Intel, OpenMPI, etc. By default libear.so is used. Different names for different versions can be specified automatically by adding the EAR version name in the corresponding MPI module. For instance, for libear.openmpi.4.0.0.so library, define **SLURM_EAR_MPI_VERSION** environment variable as openmpi.4.0.0. When EAR has been installed from sources, this name is the same it is specified in MPI_VERSION during the configure. When installed from rpm, look at ´$EAR_INSTALL_PATH/lib´ to see the available versions.
|
|
|
16. Set `default=on` to specify the EAR library will be loaded with all the applicatins by default in `plugstack.conf`. If default is set to off, EAR library can be explicitly loaded by doing --ear=on when submitting a job.
|
|
|
12. Check if EARGM is reporting to database with `ereport -g`. Note that EARGM will take a period of time set by the admin in *ear.conf* (*GlobalManagerPeriodT1* option) to report for the first time.
|
|
|
13. Set up EAR's SLURM plugin (see our [Configuration](Configuration) page for more information).
|
|
|
14. Run an application via SLURM and check that it is correctly reported to database with `eacct`. Note that only privileged users can check other users' applications.
|
|
|
15. Run an MPI application with `--ear=on` and check that the report by `eacct` now includes the library metrics. EAR library depends on the MPI version: Intel, OpenMPI, etc. By default *libear.so* is used. Different names for different versions can be specified automatically by adding the EAR version name in the corresponding MPI module. For instance, for *libear.openmpi.4.0.0.so* library, define `SLURM_EAR_MPI_VERSION` environment variable as *openmpi.4.0.0*. When EAR has been installed from sources, this name is the same as it is specified in MPI_VERSION during the `configure`. When installed from rpm, look at `$EAR_INSTALL_PATH/lib` to see the available versions.
|
|
|
16. Set `default=on` to specify the EAR library will be loaded with all the applications by default in `plugstack.conf`. If default is set to off, EAR library can be explicitly loaded by doing --ear=on when submitting a job.
|
|
|
17. At this point you can use EAR for monitoring and accounting purposes, but it cannot use the power policies for EARL. To do that, first do a [learning phase](Learning-phase) and compute the coefficients.
|
|
|
18. For the coefficients to be active, restart the daemons. __IMPORTANT__: reloading the daemons will NOT make them load the coefficients, restarting is the only way.
|
|
|
|
|
|
|
|
|
18. For the coefficients to be active, restart the daemons. __IMPORTANT__: reloading the daemons will NOT make them load the coefficients, restarting is the only way. |
|
|
\ No newline at end of file |