Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • EAR EAR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • EAR_teamEAR_team
  • EAREAR
  • Wiki
  • Powercap

Powercap · Changes

Page history
Wiki EAR4.3 authored Jul 04, 2023 by Oriol Vidal's avatar Oriol Vidal
Hide whitespace changes
Inline Side-by-side
Powercap.md
View page @ 7ff3e622
......@@ -5,7 +5,7 @@ EAR provides powercap at different levels:
- Cluster powercap, where the target power is for the entire cluster. It uses the node powercap to achieve its target.
# Node powercap
## Node powercap
Node powercap is enforced by the EARD. The initial values for each node's powercap are set in the tags section of the ear.conf (see [Tags](Configuration#tags) for more information), which include the power limit, the CPU/PKG powercap plugin and the GPU powercap plugin (if needed). The power limit can be changed at runtime via `econtrol` or by an active `EARGM` that has the node under its control.
......@@ -15,22 +15,22 @@ The main goals of the node powercap is, first and foremost, to enforce the power
Node powercap can be applied without cluster powercap by defining only the node powercap in the EAR configuration file.
# Cluster powercap
## Cluster powercap
Cluster powercap is managed by one or more EARGMs and enforced at a node level by the EARD. EARGMs have an individual power limit set in their definition (see [EARGM](Configuration#eargm-configuration) for more details) and the monitoring frequency. Each EARGM will then ask the nodes under its control (as indicated in the [nodes' definition](Configuration#island-description) for its power consumption and distribute the budget accordingly. There are two main ways in which the cluster powercap might be enforced; soft and hard cluster powercap.
## Soft cluster powercap
### Soft cluster powercap
This type of powercap is targeted to systems where exceeding the power limit is not a hardware constraint but a rule that needs enforcement for a different reason. In this scenario, the compute nodes will run as if no limit was applied until the total power consumption of the cluster reaches a percentage threshold (defined as the suspend threshold in ear.conf), at which point the EARGM will send a power limit to all the nodes to prevent the global power to go above the actual limit. Additionally, a script can be attached to the activation of the powercap in which the admin can set whichever actions they feel appropriate.
Once the cluster power goes below another percentage threshold (defined as the resume threshold in ear.conf) the EARGM will send a message to all the nodes to go back to unlimited power usage, as well as call the deactivation script set by the admin (if any is specified).
In terms of configuration, with the current implementation, the nodes need to have all a powercap set of 1 (the "unlimited" value) while the EARGM requires a set value.
In terms of configuration, `EARGMPowerCapMode` must be set to 2 (soft powercap) and all nodes need to have a `max_powercap` set in their tag. The value of `max_powercap` will be the power allocation of the nodes that have that tag. If a node has a `max_powercap` value of 1, 0 or -1 they will ignore powercap messages from an EARGM in soft cluster powercap mode.
## Hard cluster powercap
### Hard cluster powercap
Hard powercap is used when the system must not, under any circumstance, go above the power limit. This starts by always having a set powercap in the compute nodes. The job of the EARGM is to periodically monitor the state of the nodes, which will request more or less power depending on their current workload, and redistribute the power according to the needs of all nodes.
# Possible powercap values
## Possible powercap values
To set the powercap for an entire cluster one can do it two ways, specific values and calculated. With specific values, the `powercap` value in the EARGM definition must be a number > 0, and that will be the power budget for the EARGM to distribute among the nodes it controls. On the other hand, if `powercap=-1` the total power budget will be calculated automatically as the sum of the powercap values set in the tags for the nodes it controls.
......@@ -42,7 +42,7 @@ For an EARD, the valid values of `powercap` in its tag are 1 and N > 1. When set
**If the initial powercap value for a node is set to 0 the powercap will be disabled for that node and it will ignore any attempts to set it to a certain value. Set it to 1 if you ever want to set the powercap.**
# Example configurations
## Example configurations
The following is an example for hard powercap on 4 nodes, with a starting powercap of 225W each and a total power budget of 1000W. For clarity a few fields in the tags section have been skipped.
```
......@@ -149,3 +149,25 @@ Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=1 power
Island=1 nodes=node[1-4] EARGMId=1
```
## Valid configurations
There are three special values for powercap configuration, 1 (unlimited, only for Tags/Node), 0 (disabled) and -1 (auto-configure).
Furthermore, there are three cluster powercap modes for EARGM: 0 (monitoring-only), 1 (hard cluster powercap) and 2 (soft cluster powercap).
| EARGM powercap mode | EARGM powercap value | Tag powercap value | Result |
|---------------------|----------------------|--------------------|--------|
| ANY | 0 | 1 | Cluster powercap disabled, node powercap unlimited (but can be set with `econtrol`) |
| ANY | 0 | 0 | All powercap types disabled, and cannot be modified without restarting |
| ANY | 0 | N | Cluster powercap disabled, node powercap set to N |
| HARD | -1 | N | Cluster powercap set to the sum of the nodes' powercap. Node powercap set to N |
| HARD | N | -1 | Cluster powercap set to N. Node powercap set to N/number of nodes controlled by EARGM |
| HARD | N | M | Cluster powercap set to N. Node powercap set to N |
| SOFT | N | 1 | Cluster powercap set to N, node powercap unlimited. If triggered, node powercap will be set to their max_powercap value |
| SOFT | N | M | *ERROR * |
| HARD/SOFT| N | 0 | *ERROR* |
| HARD/SOFT | -1 | -1 | *ERROR* |
| HARD/SOFT | 0 | -1 | *ERROR |
| HARD/SOFT | 1 | -1 | *ERROR |
| HARD/SOFT | -1 | 1 | *ERROR* |
``` NOTE: When using soft cluster powercap, max_powercap value must be properly set for the powercap to work. ```
Clone repository
  • Home
  • User guide
    • Use cases
      • MPI applications
      • Non-MPI applications
      • Others
    • EAR data
    • Submission flags
    • Examples
    • Job accounting
    • Job energy optimization
  • Commands
    • Job accounting (eacct)
    • System energy report (ereport)
    • EAR control (econtrol)
    • Database management
    • erun
    • ear-info
  • Environment variables
    • Support for Intel(R) speed select technology
  • Admin Guide
    • Architecture/Services
    • Quick installation guide
    • Installation from source
    • Installation from RPM
      • Requirements
    • Updating
    • Configuration
    • Starting services
    • Tools
    • Learning phase
    • Plug-ins
    • Supported systems
    • Powercap
  • Database
    • Database fields
    • Updating the database from previous EAR versions
  • CHANGELOG
  • FAQs
  • Known issues