Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • EAR EAR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • EAR_teamEAR_team
  • EAREAR
  • Wiki
  • Admin guide

Admin guide · Changes

Page history
ear-5.0 authored Sep 02, 2024 by Oriol Vidal Teruel's avatar Oriol Vidal Teruel
Hide whitespace changes
Inline Side-by-side
Admin-guide.md
View page @ 33467cc1
[[_TOC_]] [[_TOC_]]
## EAR Components # EAR Components
EAR is composed of five main components:
- **Node Manager (EARD)**. The Node Manager must have root access to the node where it will be running.
- **Database Manager (EARDBD)**. The database manager requires access to the DB server (we support MariaDB and Postgress). Documentation for Postgress is still under development.
- **Global Manager (EARGM)**. The global manager needs access to all node managers in the cluster as well as access to database.
- **Library (EARL)**
- **SLURM plugin**
The following image shows the main interactions between components:
<img src="./images/EAR_arch.png" align="center" width=500> <img src="./images/EAR_arch.png" align="right" width=500>
&nbsp; EAR is composed of five main components:
&nbsp; - **Node Manager (EARD):** It is a Linux service which provides the basic node power monitoring and job accounting. It also offers an API to be used for third-parties (e.g., other EAR components) to to make priviledged operations. It must have root access to the node (usually all compute nodes) where it will be running.
- **Database Manager (EARDBD):** A Linux service (it normally runs in a service node) which caches data to be stored in a database reducing the number of queries. We currently support [MariaDB](https://mariadb.org/) and [PostgresSQL](https://www.postgresql.org/). This compoment is not needed to be enabled/used if don't use such database services to report EAR data.
- **Global Manager (EARGM):** A Linux service (it normally runs in a service node) which provides cluster-level support (e.g., powercap). It needs access to all nodes where a Node Manager is runningi the cluster.
- **EAR Library (EARL):** A Job Manager (distributed as a shared object) which provides job/application -level monitoring and optimization.
- **SLURM plug-in:** A SLURM [SPANK](https://slurm.schedmd.com/spank.html) plug-in which provides support for using EAR job accounting and loading EARL transparently for users on systems using SLURM.
For a more detailed information about EAR components, visit the [Architecture](Architecture) page. For a more detailed information about EAR components, visit the [Architecture](Architecture) page.
## Quick Installation Guide # Quick Installation Guide
This section provides a, summed up, step by step installation and execution guide for EAR. For a more in depth explanation of the necessary steps see the [Installation from source](Installation from source) page or the [Installing from RPM](#installing-from-rpm) section, following the [Configuration](Configuration) guide, or contact us at ear-support@bsc.es This section provides a, summed up, step by step installation and execution guide for EAR. For a more in depth explanation of the necessary steps see the [Installation from source](Installation from source) page or the [Installing from RPM](#installing-from-rpm) section, following the [Configuration](Configuration) guide, or contact us at ear-support@bsc.es
### EAR Requirements ## EAR Requirements
Requirements to compile EAR are: You need at least a modern **C compiler**.
- C compiler. Other requirements to compile EAR with all of its features are:
- MPI compiler. - MPI compiler and headers for supporting MPI applications. Intel MPI and OpenMPI are the most used and tested implementations for EAR development team.
- CUDA installation path if NVIDIA is used. - A [CUDA](https://developer.nvidia.com/cuda-toolkit) installation if you want support for NVIDIA GPUs.
- Likwid path if Likwid is used. - A [Likwid](https://hpc.fau.de/research/tools/likwid/) installation for retrieving performance metrics through this interface.
- Freeipmi path if freeipmi is used. - A [FreeIPMI](https://www.gnu.org/software/freeipmi/) installation if you are going to read power and energy through IPMI specification.
- GSL is needed for coefficient computations. - [GSL](https://www.gnu.org/software/gsl/) is needed by some tools for coefficient computation.
To install EAR from **rpm** (only binaries) all these dependencies have been removed except *mysqlclient*. However, they are needed when running EAR. SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR Library using the SLURM plug-in, it must be running when EARL wants to be used (not needed for the most basic node monitoring service).
SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR library using the SLURM plugin, it must be running when EAR library wants to be used (not needed for node monitoring). To install EAR from **rpm** (only binaries) all these dependencies have been removed except *mysqlclient*. However, they are needed when running EAR components.
Lastly, but not less important: Lastly, but not less important:
- The drivers for CPU frequency management (*acpi-cpufreq*) and Open IPMI must be present and loaded in compute nodes. - The drivers for CPU frequency management (*acpi-cpufreq*) and Open IPMI must be present and loaded in compute nodes.
- *msr kernel* module must be loaded in compute nodes. - *msr kernel* module must be loaded in compute nodes.
- mariaDB or postgress server must be up and running. - mariaDB or postgress server must be up and running if using these services for storing EAR data.
- Hardware counters must be accessible for normal users. Set */proc/sys/kernel/perf\_event\_paranoid* to 2 (or less). Type `sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"` in compute nodes. - Hardware counters must be accessible for normal users. Set */proc/sys/kernel/perf\_event\_paranoid* to 2 (or less). Type `sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"` in compute nodes.
Run `./configure --help` to see all the flags and options. Run `./configure --help` to see all the flags and options.
### Compiling and installing EAR ## Compiling and installing EAR
Once downloaded the code from repository, execute: Once downloaded the code from repository, execute:
- `autoreconf -i`. - `autoreconf -i`.
...@@ -65,7 +61,8 @@ It simplifies the fact of having multiple configurations (1 for each library ver ...@@ -65,7 +61,8 @@ It simplifies the fact of having multiple configurations (1 for each library ver
- The option `--disable-mpi` must be set to generate a configuration for non-MPI version of the library. - The option `--disable-mpi` must be set to generate a configuration for non-MPI version of the library.
- Use `MPI_VERSION=ompi` for OpenMPI compatible version. - Use `MPI_VERSION=ompi` for OpenMPI compatible version.
Before running `make`, review the Makefile and the configuration log to validate all the requirements of your installation have been automatically detected. In particular, if you need to use some specific library such likwid, freeipmi or CUDA. If CUDA path is specified, EAR will be compiled with GPU support. Check also that MySQL ot PostgreSQL paths have been detected. Before running `make`, review the Makefile and the configuration log to validate all the requirements of your installation have been automatically detected.
In particular, if you need to use some specific library such likwid, freeipmi or CUDA. If CUDA path is specified, EAR will be compiled with GPU support. Check also that MySQL ot PostgreSQL paths have been detected.
You can use options `USER` and `GROUP` if you want to install EAR with a special USER/GROUP. You can use options `USER` and `GROUP` if you want to install EAR with a special USER/GROUP.
The following shows how to configure EAR to be compiled with Intel MPI: The following shows how to configure EAR to be compiled with Intel MPI:
...@@ -80,20 +77,23 @@ make -f Makefile.impi etc.install ...@@ -80,20 +77,23 @@ make -f Makefile.impi etc.install
``` ```
At this point the EAR binaries will be installed including one version of the At this point the EAR binaries will be installed including one version of the
EAR library for MPI (default), EAR documentation, EAR service files for EAR EAR Library for MPI (default), EAR documentation, EAR service files for EAR
daemons and templates for `ear.conf` files and SLURM plugin. The configure daemons and templates for `ear.conf` files and SLURM plug-in. The configure
tool tries to automatically detect paths to mysql and/or postgress, scheduler tool tries to automatically detect paths for mysql and/or postgress, scheduler
sources, etc. It is mandatory to detect the scheduler path, by default SLURM is sources, etc. It is mandatory to detect the scheduler path, by default SLURM is
assumed. After the configure, check in the Makefile all the options have been assumed. After the configure, check in the Makefile all the options have been
detected. After the make install, you should have the following folders in the detected. After the `make install`, you should have the following folders in the
ear-install-path: bin, sbin, etc, lib, include, man. The bin directory includes `prefix` path:
commands and tools, the sbin includes EAR services, the lib includes all the - `bin`: Includes commands and tools.
libraries and plugins, and etc includes templates and examples for EAR service - `sbin`: Includes EAR services binaries.
files, ear.conf file, the EAR module, etc. - `etc`: Includes templates and examples for EAR service files, the `ear.conf` file, the EAR module and so.
- `lib`: Includes all libraries and plugins.
- `include`
- `man`: Man pages.
### Deployment and validation ## Deployment and validation
#### Monitoring: Compute node and DB ### Monitoring: Compute node and DB
**Prepare the configuration** **Prepare the configuration**
...@@ -152,7 +152,7 @@ In case there are problems when running econtrol, you can also find this file at ...@@ -152,7 +152,7 @@ In case there are problems when running econtrol, you can also find this file at
Check that EARDs are reporting metrics to database with ereport. `ereport -n all` Check that EARDs are reporting metrics to database with ereport. `ereport -n all`
should report the total energy sent by each daemon since the setup. should report the total energy sent by each daemon since the setup.
#### Monitoring: EAR plugin ### Monitoring: EAR plugin
- Set up EAR's SLURM plugin (see the [configuration](Configuration) section for - Set up EAR's SLURM plugin (see the [configuration](Configuration) section for
more information). more information).
...@@ -227,7 +227,7 @@ For the coefficients to be active, restart daemons. ...@@ -227,7 +227,7 @@ For the coefficients to be active, restart daemons.
> **Important** Reloading daemons will NOT make them load coefficients, restarting the service is the only way. > **Important** Reloading daemons will NOT make them load coefficients, restarting the service is the only way.
### EAR Library versions: MPI vs. Non-MPI ## EAR Library versions: MPI vs. Non-MPI
As commented in the overview, the EAR Library is loaded next to the user MPI As commented in the overview, the EAR Library is loaded next to the user MPI
application by the EAR Loader. application by the EAR Loader.
...@@ -276,7 +276,7 @@ If your MPI version is not fully compatible, please contact ear-support@bsc.es. ...@@ -276,7 +276,7 @@ If your MPI version is not fully compatible, please contact ear-support@bsc.es.
See the [User guide](User guide) to check the use cases supported and how to submit jobs with EAR. See the [User guide](User guide) to check the use cases supported and how to submit jobs with EAR.
## Installing from RPM # Installing from RPM
EAR includes the specification files to create an rpm from an already existing installation. EAR includes the specification files to create an rpm from an already existing installation.
The spec file is placed at `etc/rpms`. The spec file is placed at `etc/rpms`.
...@@ -297,7 +297,7 @@ Once you have the rpm file, execute the following steps: ...@@ -297,7 +297,7 @@ Once you have the rpm file, execute the following steps:
- During the installation the configuration files `*.in` are compiled to the ready to use version, replacing tags for correct paths. You will have more information of those files in the following pages. Check the [next section](#installation-content) for more information. - During the installation the configuration files `*.in` are compiled to the ready to use version, replacing tags for correct paths. You will have more information of those files in the following pages. Check the [next section](#installation-content) for more information.
- Type `rpm -e ear.version` to uninstall. - Type `rpm -e ear.version` to uninstall.
### Installation content ## Installation content
The `*.in` configuration files are compiled into `etc/ear/ear.conf.template` The `*.in` configuration files are compiled into `etc/ear/ear.conf.template`
and `etc/ear/ear.full.conf.template`, `etc/module/ear`, `etc/slurm/ear.plugstack.conf` and `etc/ear/ear.full.conf.template`, `etc/module/ear`, `etc/slurm/ear.plugstack.conf`
...@@ -319,7 +319,7 @@ Below table describes the complet heriarchy of the EAR installation: ...@@ -319,7 +319,7 @@ Below table describes the complet heriarchy of the EAR installation:
| `/etc/slurm` | EAR SLURM plugin configuration file. | | `/etc/slurm` | EAR SLURM plugin configuration file. |
| `/etc/systemd` | EAR service files. | | `/etc/systemd` | EAR service files. |
### RPM requirements ## RPM requirements
EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in `LD_LIBRARY_PATH` when running an application and you want to use EAR. EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in `LD_LIBRARY_PATH` when running an application and you want to use EAR.
Depending on the RPM, different version must be required for these libraries: Depending on the RPM, different version must be required for these libraries:
...@@ -356,7 +356,7 @@ Also, some **drivers** has to be present and loaded in the system when starting ...@@ -356,7 +356,7 @@ Also, some **drivers** has to be present and loaded in the system when starting
| CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | [Information](https://wiki.archlinux.org/index.php/CPU_frequency_scaling) | | CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | [Information](https://wiki.archlinux.org/index.php/CPU_frequency_scaling) |
| Open IPMI | kernel/drivers/char/ipmi/\*.ko | 3.10 | [Information](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwlin/configuring-the-open-ipmi-driver.html) | | Open IPMI | kernel/drivers/char/ipmi/\*.ko | 3.10 | [Information](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwlin/configuring-the-open-ipmi-driver.html) |
## Starting Services # Starting Services
The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method. The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method.
...@@ -371,7 +371,7 @@ When using `systemctl` commands, you can check messages reported to `stderr` usi ...@@ -371,7 +371,7 @@ When using `systemctl` commands, you can check messages reported to `stderr` usi
Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as `pdsh`. As an example: Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as `pdsh`. As an example:
`sudo pdsh -w nodelist systemctl start eard`. `sudo pdsh -w nodelist systemctl start eard`.
## Updating EAR with a new installation # Updating EAR with a new installation
In some cases, it might be a good idea to create a new install instead of updating your current one, like trying new configurations or when a big update is released. In some cases, it might be a good idea to create a new install instead of updating your current one, like trying new configurations or when a big update is released.
...@@ -384,6 +384,6 @@ The steps to do so are: ...@@ -384,6 +384,6 @@ The steps to do so are:
Once all that is done, one should have two complete EAR installs that can be switched by changing the binaries that are executed by the services and changing the path in ```plugstag.conf```. Once all that is done, one should have two complete EAR installs that can be switched by changing the binaries that are executed by the services and changing the path in ```plugstag.conf```.
## Next steps # Next steps
For a better overview of the installation process, return to the [installation guide](#quick-installation-guide). For a better overview of the installation process, return to the [installation guide](#quick-installation-guide).
To continue the installation, visit the [configuration page](Configuration) to set up properly the EAR configuration file and the EAR SLURM plugin stack file. To continue the installation, visit the [configuration page](Configuration) to set up properly the EAR configuration file and the EAR SLURM plugin stack file.
Clone repository
  • Home
  • User guide
    • Use cases
      • MPI applications
      • Non-MPI applications
      • Others
    • EAR data
    • Submission flags
    • Examples
    • Job accounting
    • Job energy optimization
    • Data visualization
  • Commands
    • Job accounting (eacct)
    • System energy report (ereport)
    • EAR control (econtrol)
    • Database management
    • erun
    • ear-info
  • Environment variables
    • Support for Intel(R) speed select technology
  • Admin Guide
    • Quick installation guide
    • Installation from RPM
    • Updating
  • Installation from source
  • Architecture/Services
  • High Availability support
  • Configuration
  • Learning phase
  • Plug-ins
  • Powercap
  • Report plug-ins
  • Database
    • Updating the database from previous EAR versions
    • Tables description
  • Supported systems
  • EAR Data Center Monitoring
  • CHANGELOG
  • FAQs
  • Known issues
  • Tutorial