|
|
[[_TOC_]]
|
|
|
|
|
|
## EAR Components
|
|
|
EAR is composed of five main components:
|
|
|
- **Node Manager (EARD)**. The Node Manager must have root access to the node where it will be running.
|
|
|
- **Database Manager (EARDBD)**. The database manager requires access to the DB server (we support MariaDB and Postgress). Documentation for Postgress is still under development.
|
|
|
- **Global Manager (EARGM)**. The global manager needs access to all node managers in the cluster as well as access to database.
|
|
|
- **Library (EARL)**
|
|
|
- **SLURM plugin**
|
|
|
|
|
|
The following image shows the main interactions between components:
|
|
|
# EAR Components
|
|
|
|
|
|
<img src="./images/EAR_arch.png" align="center" width=500>
|
|
|
<img src="./images/EAR_arch.png" align="right" width=500>
|
|
|
|
|
|
|
|
|
|
|
|
EAR is composed of five main components:
|
|
|
- **Node Manager (EARD):** It is a Linux service which provides the basic node power monitoring and job accounting. It also offers an API to be used for third-parties (e.g., other EAR components) to to make priviledged operations. It must have root access to the node (usually all compute nodes) where it will be running.
|
|
|
- **Database Manager (EARDBD):** A Linux service (it normally runs in a service node) which caches data to be stored in a database reducing the number of queries. We currently support [MariaDB](https://mariadb.org/) and [PostgresSQL](https://www.postgresql.org/). This compoment is not needed to be enabled/used if don't use such database services to report EAR data.
|
|
|
- **Global Manager (EARGM):** A Linux service (it normally runs in a service node) which provides cluster-level support (e.g., powercap). It needs access to all nodes where a Node Manager is runningi the cluster.
|
|
|
- **EAR Library (EARL):** A Job Manager (distributed as a shared object) which provides job/application -level monitoring and optimization.
|
|
|
- **SLURM plug-in:** A SLURM [SPANK](https://slurm.schedmd.com/spank.html) plug-in which provides support for using EAR job accounting and loading EARL transparently for users on systems using SLURM.
|
|
|
|
|
|
For a more detailed information about EAR components, visit the [Architecture](Architecture) page.
|
|
|
|
|
|
## Quick Installation Guide
|
|
|
# Quick Installation Guide
|
|
|
|
|
|
This section provides a, summed up, step by step installation and execution guide for EAR. For a more in depth explanation of the necessary steps see the [Installation from source](Installation from source) page or the [Installing from RPM](#installing-from-rpm) section, following the [Configuration](Configuration) guide, or contact us at ear-support@bsc.es
|
|
|
|
|
|
### EAR Requirements
|
|
|
## EAR Requirements
|
|
|
|
|
|
Requirements to compile EAR are:
|
|
|
- C compiler.
|
|
|
- MPI compiler.
|
|
|
- CUDA installation path if NVIDIA is used.
|
|
|
- Likwid path if Likwid is used.
|
|
|
- Freeipmi path if freeipmi is used.
|
|
|
- GSL is needed for coefficient computations.
|
|
|
You need at least a modern **C compiler**.
|
|
|
Other requirements to compile EAR with all of its features are:
|
|
|
- MPI compiler and headers for supporting MPI applications. Intel MPI and OpenMPI are the most used and tested implementations for EAR development team.
|
|
|
- A [CUDA](https://developer.nvidia.com/cuda-toolkit) installation if you want support for NVIDIA GPUs.
|
|
|
- A [Likwid](https://hpc.fau.de/research/tools/likwid/) installation for retrieving performance metrics through this interface.
|
|
|
- A [FreeIPMI](https://www.gnu.org/software/freeipmi/) installation if you are going to read power and energy through IPMI specification.
|
|
|
- [GSL](https://www.gnu.org/software/gsl/) is needed by some tools for coefficient computation.
|
|
|
|
|
|
To install EAR from **rpm** (only binaries) all these dependencies have been removed except *mysqlclient*. However, they are needed when running EAR.
|
|
|
SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR Library using the SLURM plug-in, it must be running when EARL wants to be used (not needed for the most basic node monitoring service).
|
|
|
|
|
|
SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR library using the SLURM plugin, it must be running when EAR library wants to be used (not needed for node monitoring).
|
|
|
To install EAR from **rpm** (only binaries) all these dependencies have been removed except *mysqlclient*. However, they are needed when running EAR components.
|
|
|
|
|
|
Lastly, but not less important:
|
|
|
- The drivers for CPU frequency management (*acpi-cpufreq*) and Open IPMI must be present and loaded in compute nodes.
|
|
|
- *msr kernel* module must be loaded in compute nodes.
|
|
|
- mariaDB or postgress server must be up and running.
|
|
|
- mariaDB or postgress server must be up and running if using these services for storing EAR data.
|
|
|
- Hardware counters must be accessible for normal users. Set */proc/sys/kernel/perf\_event\_paranoid* to 2 (or less). Type `sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"` in compute nodes.
|
|
|
|
|
|
Run `./configure --help` to see all the flags and options.
|
|
|
|
|
|
### Compiling and installing EAR
|
|
|
## Compiling and installing EAR
|
|
|
|
|
|
Once downloaded the code from repository, execute:
|
|
|
- `autoreconf -i`.
|
... | ... | @@ -65,7 +61,8 @@ It simplifies the fact of having multiple configurations (1 for each library ver |
|
|
- The option `--disable-mpi` must be set to generate a configuration for non-MPI version of the library.
|
|
|
- Use `MPI_VERSION=ompi` for OpenMPI compatible version.
|
|
|
|
|
|
Before running `make`, review the Makefile and the configuration log to validate all the requirements of your installation have been automatically detected. In particular, if you need to use some specific library such likwid, freeipmi or CUDA. If CUDA path is specified, EAR will be compiled with GPU support. Check also that MySQL ot PostgreSQL paths have been detected.
|
|
|
Before running `make`, review the Makefile and the configuration log to validate all the requirements of your installation have been automatically detected.
|
|
|
In particular, if you need to use some specific library such likwid, freeipmi or CUDA. If CUDA path is specified, EAR will be compiled with GPU support. Check also that MySQL ot PostgreSQL paths have been detected.
|
|
|
You can use options `USER` and `GROUP` if you want to install EAR with a special USER/GROUP.
|
|
|
|
|
|
The following shows how to configure EAR to be compiled with Intel MPI:
|
... | ... | @@ -80,20 +77,23 @@ make -f Makefile.impi etc.install |
|
|
```
|
|
|
|
|
|
At this point the EAR binaries will be installed including one version of the
|
|
|
EAR library for MPI (default), EAR documentation, EAR service files for EAR
|
|
|
daemons and templates for `ear.conf` files and SLURM plugin. The configure
|
|
|
tool tries to automatically detect paths to mysql and/or postgress, scheduler
|
|
|
EAR Library for MPI (default), EAR documentation, EAR service files for EAR
|
|
|
daemons and templates for `ear.conf` files and SLURM plug-in. The configure
|
|
|
tool tries to automatically detect paths for mysql and/or postgress, scheduler
|
|
|
sources, etc. It is mandatory to detect the scheduler path, by default SLURM is
|
|
|
assumed. After the configure, check in the Makefile all the options have been
|
|
|
detected. After the make install, you should have the following folders in the
|
|
|
ear-install-path: bin, sbin, etc, lib, include, man. The bin directory includes
|
|
|
commands and tools, the sbin includes EAR services, the lib includes all the
|
|
|
libraries and plugins, and etc includes templates and examples for EAR service
|
|
|
files, ear.conf file, the EAR module, etc.
|
|
|
detected. After the `make install`, you should have the following folders in the
|
|
|
`prefix` path:
|
|
|
- `bin`: Includes commands and tools.
|
|
|
- `sbin`: Includes EAR services binaries.
|
|
|
- `etc`: Includes templates and examples for EAR service files, the `ear.conf` file, the EAR module and so.
|
|
|
- `lib`: Includes all libraries and plugins.
|
|
|
- `include`
|
|
|
- `man`: Man pages.
|
|
|
|
|
|
### Deployment and validation
|
|
|
## Deployment and validation
|
|
|
|
|
|
#### Monitoring: Compute node and DB
|
|
|
### Monitoring: Compute node and DB
|
|
|
|
|
|
**Prepare the configuration**
|
|
|
|
... | ... | @@ -152,7 +152,7 @@ In case there are problems when running econtrol, you can also find this file at |
|
|
Check that EARDs are reporting metrics to database with ereport. `ereport -n all`
|
|
|
should report the total energy sent by each daemon since the setup.
|
|
|
|
|
|
#### Monitoring: EAR plugin
|
|
|
### Monitoring: EAR plugin
|
|
|
|
|
|
- Set up EAR's SLURM plugin (see the [configuration](Configuration) section for
|
|
|
more information).
|
... | ... | @@ -227,7 +227,7 @@ For the coefficients to be active, restart daemons. |
|
|
|
|
|
> **Important** Reloading daemons will NOT make them load coefficients, restarting the service is the only way.
|
|
|
|
|
|
### EAR Library versions: MPI vs. Non-MPI
|
|
|
## EAR Library versions: MPI vs. Non-MPI
|
|
|
|
|
|
As commented in the overview, the EAR Library is loaded next to the user MPI
|
|
|
application by the EAR Loader.
|
... | ... | @@ -276,7 +276,7 @@ If your MPI version is not fully compatible, please contact ear-support@bsc.es. |
|
|
|
|
|
See the [User guide](User guide) to check the use cases supported and how to submit jobs with EAR.
|
|
|
|
|
|
## Installing from RPM
|
|
|
# Installing from RPM
|
|
|
|
|
|
EAR includes the specification files to create an rpm from an already existing installation.
|
|
|
The spec file is placed at `etc/rpms`.
|
... | ... | @@ -297,7 +297,7 @@ Once you have the rpm file, execute the following steps: |
|
|
- During the installation the configuration files `*.in` are compiled to the ready to use version, replacing tags for correct paths. You will have more information of those files in the following pages. Check the [next section](#installation-content) for more information.
|
|
|
- Type `rpm -e ear.version` to uninstall.
|
|
|
|
|
|
### Installation content
|
|
|
## Installation content
|
|
|
|
|
|
The `*.in` configuration files are compiled into `etc/ear/ear.conf.template`
|
|
|
and `etc/ear/ear.full.conf.template`, `etc/module/ear`, `etc/slurm/ear.plugstack.conf`
|
... | ... | @@ -319,7 +319,7 @@ Below table describes the complet heriarchy of the EAR installation: |
|
|
| `/etc/slurm` | EAR SLURM plugin configuration file. |
|
|
|
| `/etc/systemd` | EAR service files. |
|
|
|
|
|
|
### RPM requirements
|
|
|
## RPM requirements
|
|
|
|
|
|
EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in `LD_LIBRARY_PATH` when running an application and you want to use EAR.
|
|
|
Depending on the RPM, different version must be required for these libraries:
|
... | ... | @@ -356,7 +356,7 @@ Also, some **drivers** has to be present and loaded in the system when starting |
|
|
| CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | [Information](https://wiki.archlinux.org/index.php/CPU_frequency_scaling) |
|
|
|
| Open IPMI | kernel/drivers/char/ipmi/\*.ko | 3.10 | [Information](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwlin/configuring-the-open-ipmi-driver.html) |
|
|
|
|
|
|
## Starting Services
|
|
|
# Starting Services
|
|
|
|
|
|
The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method.
|
|
|
|
... | ... | @@ -371,7 +371,7 @@ When using `systemctl` commands, you can check messages reported to `stderr` usi |
|
|
Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as `pdsh`. As an example:
|
|
|
`sudo pdsh -w nodelist systemctl start eard`.
|
|
|
|
|
|
## Updating EAR with a new installation
|
|
|
# Updating EAR with a new installation
|
|
|
|
|
|
In some cases, it might be a good idea to create a new install instead of updating your current one, like trying new configurations or when a big update is released.
|
|
|
|
... | ... | @@ -384,6 +384,6 @@ The steps to do so are: |
|
|
|
|
|
Once all that is done, one should have two complete EAR installs that can be switched by changing the binaries that are executed by the services and changing the path in ```plugstag.conf```.
|
|
|
|
|
|
## Next steps
|
|
|
# Next steps
|
|
|
For a better overview of the installation process, return to the [installation guide](#quick-installation-guide).
|
|
|
To continue the installation, visit the [configuration page](Configuration) to set up properly the EAR configuration file and the EAR SLURM plugin stack file. |