|
EAR offers the following commands:
|
|
EAR offers a set of commands which help both users and administrators to interact with different components:
|
|
|
|
|
|
- Commands to analyze data stored in the DB: [eacct](#ear-job-accounting-eacct) and [ereport](#energy-report-ereport).
|
|
- Commands to retrieve data stored in the DB: [eacct](#ear-job-accounting-eacct) and [ereport](#energy-report-ereport).
|
|
- Commands to control and temporally modify cluster settings: [econtrol](#energy-control-econtrol).
|
|
- Commands to control and temporary modify cluster settings: [econtrol](#energy-control-econtrol).
|
|
- Commands to create/update/clean the DB: [edb_create](#edb_create), [edb_clean_pm](#edb_clean_pm) and [edb_clean_apps](#edb_clean_apps).
|
|
- Commands to create/update/clean the DB: [edb_create](#edb_create), [edb\_clean\_pm](#edb_clean_pm) and [edb\_clean\_apps](#edb_clean_apps).
|
|
- A command to run OpenMPI applications with EAR on SLURM systems through `mpirun` command: [erun](#erun).
|
|
- A command to load transparently the EAR Library on systems where the batch scheduler has not a plug-in nor some use case isn't supported (e.g., running an OpenMPI application on SLURM systems through the `mpirun` command): [erun](#erun).
|
|
- A command to show current EAR installation information: [ear-info](#ear-info).
|
|
- A command to show current EAR installation information: [ear-info](#ear-info).
|
|
|
|
|
|
Commands belonging to the first three categories read the EAR configurarion file
|
|
Commands belonging to the first three categories read the EAR configurarion file
|
... | @@ -18,23 +18,21 @@ Some options are disabled when the user is not authorized. |
... | @@ -18,23 +18,21 @@ Some options are disabled when the user is not authorized. |
|
|
|
|
|
# EAR job Accounting (eacct)
|
|
# EAR job Accounting (eacct)
|
|
|
|
|
|
The `eacct` command shows accounting information stored in the EAR DB for jobs
|
|
The `eacct` is a simple command to see a jobs' energy accounting information.
|
|
(and step) IDs.
|
|
It can also retrieve EARL events that occurred on a job execution.
|
|
The command uses EAR's configuration file to determine if the user running it is
|
|
|
|
privileged or not, as **non-privileged users can only access their information**.
|
|
The command uses the EAR Configuration file to determine whether the user running it is
|
|
|
|
authorized, as **non-privileged users can only access their information**.
|
|
It provides the following options.
|
|
It provides the following options.
|
|
|
|
|
|
```
|
|
```
|
|
Usage: eacct [Optional parameters]
|
|
|
|
Optional parameters:
|
|
|
|
-h displays this message
|
|
-h displays this message
|
|
-v displays current EAR version
|
|
-v displays current EAR version
|
|
-b verbose mode for debugging purposes
|
|
|
|
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
|
|
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
|
|
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
|
|
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
|
|
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
|
|
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
|
|
-a specifies the application names that will be retrieved. [default: all app_ids]
|
|
-a specifies the application names that will be retrieved. [default: all app_ids]
|
|
-c specifies the file where the output will be stored in CSV format. If the argument is "no_file" the output will be printed to STDOUT [default: off]
|
|
-c specifies the file where the output will be stored in CSV format. [default: no file]
|
|
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
|
|
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
|
|
-s specifies the minimum start time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
|
|
-s specifies the minimum start time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
|
|
-e specifies the maximum end time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
|
|
-e specifies the maximum end time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
|
... | @@ -45,6 +43,7 @@ Usage: eacct [Optional parameters] |
... | @@ -45,6 +43,7 @@ Usage: eacct [Optional parameters] |
|
-o modifies the -r option to also show the corresponding jobs. Should be used with -j.
|
|
-o modifies the -r option to also show the corresponding jobs. Should be used with -j.
|
|
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
|
|
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
|
|
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
|
|
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
|
|
|
|
-b verbose mode for debugging purposes
|
|
```
|
|
```
|
|
|
|
|
|
The basic usage of `eacct` retrieves the last 20 applications (by default) of the
|
|
The basic usage of `eacct` retrieves the last 20 applications (by default) of the
|
... | @@ -259,6 +258,9 @@ Usage: econtrol [options] |
... | @@ -259,6 +258,9 @@ Usage: econtrol [options] |
|
--restore-conf ->restores the configuration for all nodes
|
|
--restore-conf ->restores the configuration for all nodes
|
|
--active-only ->supresses inactive nodes from the output in hardware status.
|
|
--active-only ->supresses inactive nodes from the output in hardware status.
|
|
--health-check ->checks all EARDs and EARDBDs for errors and prints all that are unresponsive.
|
|
--health-check ->checks all EARDs and EARDBDs for errors and prints all that are unresponsive.
|
|
|
|
--domain [domain:target] ->sends the requested command to the requested targets, effectively filtering
|
|
|
|
which nodes receive the message. Available domains are: tag, node, subcluster/eargmid, island.
|
|
|
|
|
|
--mail [address] ->sends the output of the program to address.
|
|
--mail [address] ->sends the output of the program to address.
|
|
--ping ->pings all nodes to check whether the nodes are up or not. Additionally,
|
|
--ping ->pings all nodes to check whether the nodes are up or not. Additionally,
|
|
--ping=node_name pings that node individually.
|
|
--ping=node_name pings that node individually.
|
... | @@ -291,6 +293,65 @@ Node id Job-Step M-Rank DC power CPI GBS Gflops Time Avg |
... | @@ -291,6 +293,65 @@ Node id Job-Step M-Rank DC power CPI GBS Gflops Time Avg |
|
node2 6878-0 0 280.13 0.37 24.39 137.57 56.00 2.59
|
|
node2 6878-0 0 280.13 0.37 24.39 137.57 56.00 2.59
|
|
node3 6878-0 1 245.44 0.37 24.29 136.40 56.00 2.59
|
|
node3 6878-0 1 245.44 0.37 24.29 136.40 56.00 2.59
|
|
```
|
|
```
|
|
|
|
A list of nodes can be specified to only target those for the commands:
|
|
|
|
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --status --hosts node2,node3
|
|
|
|
hostname power temp freq job_id stepid
|
|
|
|
node2 278 66C 2.59 6878 0
|
|
|
|
node3 274 57C 2.59 6878 0
|
|
|
|
|
|
|
|
[user@login]$ econtrol --status --hosts island[0,1]node[2-3]
|
|
|
|
hostname power temp freq job_id stepid
|
|
|
|
island0node2 278 66C 2.59 0 0
|
|
|
|
island0node3 274 57C 2.59 0 0
|
|
|
|
island1node2 273 56C 2.59 0 0
|
|
|
|
island1node3 272 57C 2.59 0 0
|
|
|
|
```
|
|
|
|
|
|
|
|
If only one node is targeted for a status, one may do:
|
|
|
|
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --status=island0node2
|
|
|
|
hostname power temp freq job_id stepid
|
|
|
|
island0node2 278 66C 2.59 0 0
|
|
|
|
```
|
|
|
|
|
|
|
|
For any other command type (including status):
|
|
|
|
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --status --hosts island0node2
|
|
|
|
hostname power temp freq job_id stepid
|
|
|
|
island0node2 278 66C 2.59 0 0
|
|
|
|
|
|
|
|
[user@login]$ econtrol --restore-conf --hosts island0node2
|
|
|
|
|
|
|
|
[user@login]$ econtrol --status --domain node:island0node2
|
|
|
|
hostname power temp freq job_id stepid
|
|
|
|
island0node2 278 66C 2.59 0 0
|
|
|
|
```
|
|
|
|
|
|
|
|
Furthermore, the domain option may be used to filter out any nodes not belonging to a specified domain.
|
|
|
|
|
|
|
|
To only send a command to a single island (as defined in ear.conf):
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --restore-conf --domain island:2
|
|
|
|
```
|
|
|
|
|
|
|
|
To send to all the nodes that belong to a tag (as defined in ear.conf), regardless of their island or any other configuration:
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --set-powercap 400 --domain tag:cpu_only
|
|
|
|
```
|
|
|
|
|
|
|
|
One can also do use a double filter:
|
|
|
|
```
|
|
|
|
[user@login]$ econtrol --set-freq 3100000 --domain tag:cpu_only island:3
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**NOTE**: Using domain filtering with a hostlist specification is not supported and may cause some errors.
|
|
|
|
|
|
|
|
|
|
# Database commands
|
|
# Database commands
|
|
|
|
|
... | | ... | |