|
|
|
|
|
|
|
EAR provides powercap at different levels:
|
|
|
|
|
|
|
|
- Node powercap, where a node cannot exceed their given power consumption.
|
|
|
|
- Cluster powercap, where the target power is for the entire cluster. It uses the node powercap to achieve its target.
|
|
|
|
|
|
|
|
|
|
|
|
# Node powercap
|
|
|
|
|
|
|
|
Node powercap is enforced by the EARD. The initial values for each node's powercap are set in the tags section of the ear.conf (see [Tags](Configuration#tags) for more information), which include the power limit, the CPU/PKG powercap plugin and the GPU powercap plugin (if needed). The power limit can be changed at runtime via `econtrol` or by an active `EARGM` that has the node under its control.
|
|
|
|
|
|
|
|
The EARD enforces the powercap via its plugins, which in turn ensure that the domain they control (CPU/GPU) does not exceed their power allocation.
|
|
|
|
|
|
|
|
The main goals of the node powercap is, first and foremost, to enforce the power limit with the secondary goal to maximize performance while under said limit. The EARD will use its current power limit as a budget which it will, in turn, distribute among the domains (controlled by the plugins) according to the current node's needs.
|
|
|
|
|
|
|
|
Node powercap can be applied without cluster powercap by defining only the node powercap in the EAR configuration file.
|
|
|
|
|
|
|
|
# Cluster powercap
|
|
|
|
|
|
|
|
Cluster powercap is managed by one or more EARGMs and enforced at a node level by the EARD. EARGMs have an individual power limit set in their definition (see [EARGM](Configuration#eargm-configuration) for more details) and the monitoring frequency. Each EARGM will then ask the nodes under its control (as indicated in the [nodes' definition](Configuration#island-description) for its power consumption and distribute the budget accordingly. There are two main ways in which the cluster powercap might be enforced; soft and hard cluster powercap.
|
|
|
|
|
|
|
|
## Soft cluster powercap
|
|
|
|
|
|
|
|
This type of powercap is targeted to systems where exceeding the power limit is not a hardware constraint but a rule that needs enforcement for a different reason. In this scenario, the compute nodes will run as if no limit was applied until the total power consumption of the cluster reaches a percentage threshold (defined as the suspend threshold in ear.conf), at which point the EARGM will send a power limit to all the nodes to prevent the global power to go above the actual limit. Additionally, a script can be attached to the activation of the powercap in which the admin can set whichever actions they feel appropriate.
|
|
|
|
Once the cluster power goes below another percentage threshold (defined as the resume threshold in ear.conf) the EARGM will send a message to all the nodes to go back to unlimited power usage, as well as call the deactivation script set by the admin (if any is specified).
|
|
|
|
|
|
|
|
In terms of configuration, with the current implementation, the nodes need to have all a powercap set of 1 (the "unlimited" value) while the EARGM requires a set value.
|
|
|
|
|
|
|
|
## Hard cluster powercap
|
|
|
|
|
|
|
|
Hard powercap is used when the system must not, under any circumstance, go above the power limit. This starts by always having a set powercap in the compute nodes. The job of the EARGM is to periodically monitor the state of the nodes, which will request more or less power depending on their current workload, and redistribute the power according to the needs of all nodes.
|
|
|
|
|
|
|
|
# Possible powercap values
|
|
|
|
|
|
|
|
To set the powercap for an entire cluster one can do it two ways, specific values and calculated. With specific values, the `powercap` value in the EARGM definition must be a number > 0, and that will be the power budget for the EARGM to distribute among the nodes it controls. On the other hand, if `powercap=-1` the total power budget will be calculated automatically as the sum of the powercap values set in the tags for the nodes it controls.
|
|
|
|
|
|
|
|
For an EARD, the valid values of `powercap` in its tag are 1 and N > 1. When set to 1, the daemon will run with no power limit until it receives one. On the other hand, if the powercap is a higher number that will be used as the power limit until a different value is set via `econtrol` or EARGM reallocations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**If either powercap or EARGMPowercapMode is set to 0 in the configuration file, the thread that controls the power limits will not be started and the feature will be disabled.**
|
|
|
|
|
|
|
|
**If the initial powercap value for a node is set to 0 the powercap will be disabled for that node and it will ignore any attempts to set it to a certain value. Set it to 1 if you ever want to set the powercap.**
|
|
|
|
|
|
|
|
# Example configurations
|
|
|
|
|
|
|
|
The following is an example for hard powercap on 4 nodes, with a starting powercap of 225W each and a total power budget of 1000W. For clarity a few fields in the tags section have been skipped.
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=1
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=1000 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=225 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|
|
|
|
This example is similar to the previous one, but the global powercap is calculated by the EARGM as the sum of the nodes. In this case, the nodes start with a default powercap of 250W and the total budget for the cluster remains 1000W.
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=1
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=-1 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|
|
|
|
The following is a soft powercap example with a power budget of 1000W. The nodes will start without a set powercap but will be ready to activate it.
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap as soft powercap
|
|
|
|
EARGMPowercapMode=2
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=1000 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=1 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|
|
|
|
Finally, this example has ONLY node powercap, with the nodes having a limit of 250W. There will be no reallocation:
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=1
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=0 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
This is the same, but deactivating the powercap by setting the mode to 0:
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=0
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=1000 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
This is an erroneous way to set it up, because the nodes' powercap capabilities will not be active:
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=1
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=1000 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=0 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|
|
|
|
Similarly, this following example does not work because the EARGM cannot calculate a valid powercap when the nodes are set to unlimited:
|
|
|
|
```
|
|
|
|
# Wait period between power checks
|
|
|
|
EARGMPowerPeriod=120
|
|
|
|
# Activate powercap
|
|
|
|
EARGMPowercapMode=1
|
|
|
|
# Set up at least 1 EARGM
|
|
|
|
EARGMId=1 energy=XXX power=-1 node=node1
|
|
|
|
|
|
|
|
# Set up the nodes
|
|
|
|
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=1 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
|
|
|
|
|
|
|
|
Island=1 nodes=node[1-4] EARGMId=1
|
|
|
|
```
|
|
|
|
|