Lluis Alonso · d3f5a26a
--- a/EAR-commands.md
+++ b/EAR-commands.md
+EAR offers the following commands:
+
+- Commands to analyze data stored in the DB: [eacct](#ear-job-accounting-eacct) and [ereport](#energy-report-ereport).
+- Commands to control and temporally modify cluster settings: [econtrol](#energy-control-econtrol).
+- Commands to create/update/clean the DB: [edb_create](#edb_create), [edb_clean_pm](#edb_clean_pm) and [edb_clean_apps](#edb_clean_apps).
+- A command to run OpenMPI applications with EAR on SLURM systems through `mpirun` command: [erun](#erun).
+
+Commands belonging to the first three categories read the EAR configurarion file
+(`ear.conf`) to determine whether the user is authorized, as some of them has some
+features (or the wall command) only available that set of users.
+Root is a special case, it doesn't need to be included in the list of authorized users.
+Some options are disabled when the user is not authorized.
+
+> **NOTE** EAR module must be loaded in your environment in order to use EAR commands.
+
+[[_TOC_]]
+
+# EAR job Accounting (eacct)
+
+The `eacct` command shows accounting information stored in the EAR DB for jobs
+(and step) IDs.
+The command uses EAR's configuration file to determine if the user running it is
+privileged or not, as **non-privileged users can only access their information**.
+It provides the following options.
+
+```
+Usage: eacct [Optional parameters]
+	Optional parameters: 
+		-h	displays this message
+		-v	displays current EAR version
+		-b	verbose mode for debugging purposes
+		-u	specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
+		-j	specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
+				A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
+		-a	specifies the application names that will be retrieved. [default: all app_ids]
+		-c	specifies the file where the output will be stored in CSV format. If the argument is "no_file" the output will be printed to STDOUT [default: off]
+		-t	specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
+		-s	specifies the minimum start time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
+		-e	specifies the maximum end time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
+		-l	shows the information for each node for each job instead of the global statistics for said job.
+		-x	shows the last EAR events. Nodes, job ids, and step ids can be specified as if were showing job information.
+		-m	prints power signatures regardless of whether mpi signatures are available or not.
+		-r	shows the EAR loop signatures. Nodes, job ids, and step ids can be specified as if were showing job information.
+		-o	modifies the -r option to also show the corresponding jobs. Should be used with -j.
+		-n	specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
+		-f	specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
+```
+
+The basic usage of `eacct` retrieves the last 20 applications (by default) of the
+user executing it.
+If a user is **privileged**, they may see all users applications.
+The default behaviour shows data from each job-step, aggregating the values from
+each node in said job-step. If using SLURM as a job manager, a *sb* (sbatch) job-step
+is created with the data from the entire execution.
+A specific job may be specified with `-j` option.
+
+Below table shows some examples of `eacct` usage.
+
+| Command line                              | Description                                                                 |
+| ----------------------------------------- | --------------------------------------------------------------------------- |
+| eacct                                     | Shows last 20 jobs executed by the user.                                    |
+| eacct -j \<JobID\>                        | Shows data of the job \<JobID\>, one row for each step of the job.          |
+| eacct -j \<JobID\>.\<StepID\>             | Shows data of the step \<StepID\> of job \<JobID\>.                         |
+| eacct -j \<JobIDx\>,\<JobIDy\>,\<JobIDz\> | Shows data of jobs (one row per step) \<JobIDx\>,\<JobIDy\> and \<JobIDz\>. |
+
+The command shows a pre-selected set of columns:
+
+| Column field     | Description                                                                                                                              |
+| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| JOB-STEP         | JobID and StepID reported. JobID-*sb* is shown for the sbatch step in SLURM systems.                                                     |
+| USER             | The username of the user who executed the job.                                                                                           | 
+| APPLICATION      | Job’s name or executable name if job name is not provided.                                                                               |
+| POLICY           | Energy optimization policy name. *MO* means for monitoring, *ME* for min\_energy, *MT* for min\_time and *NP* is the job ran without EARL. |
+| NODES            | Number of nodes involved in the job run.                                                                                                 |
+| AVG/DEF/IMC(GHz) | Average CPU frequency, default frequency and average uncore frequency. Includes all the nodes for the step. In GHz.                      |
+| TIME(s)          | Average step execution time along all nodes, in seconds.                                                                                 |
+| POWER(W)         | Average node power along all the nodes, in Watts.                                                                                        |
+| GBS              | CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification.                                                         |
+| CPI              | CPU Cycles per Instruction. Hint for CPU/Memory bound classification.                                                                    |
+| ENERGY(J)        | Accumulated node energy. Includes all the nodes. In Joules.                                                                              |
+| GFLOPS/W         | CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions.                             |
+| IO(MBS)          | I/O (read and write) Mega Bytes per second.                                                                                              |
+| MPI%             | Percentage of MPI time over the total execution time. It’s the average including all the processes and nodes.                            |
+
+If EAR supports GPU monitoring/optimisation, the following columns are added:
+
+| Column field     | Description |
+| ---------------- | ----------- |
+| G-POW (T/U)      | Average GPU power. Accumulated per node and average along involved nodes. *T* mean for total GPU power consumed (even the job is not using any or all of GPUs in one node). *U* means for only used GPUs on each node. |
+| G-FREQ           | Average GPU frequency. Per node and average of all the nodes. |
+| G-UTIL(G/MEM)    | GPU utilization and GPU memory utilization.                   |
+
+For node-specific information, the `-l` (i.e., long) option provides detailed accounting of each individual node.
+In addition, `eacct` shows an additional column: `VPI(%)`.
+The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
+
+For runtime data (EAR loops) one may retrieve them with `-r`.
+Both Job and Step ID filtering works.
+To easily transfer command's output, `-c` option saves it in .csv format.
+Both aggregated and detailed accountings are available, as well as filtering:
+
+| Command line | Description |
+| ------------ | ----------- |
+| eacct -j \<JobID\> -c test.csv | Adds to the file `test.csv` all metrics shown above for each step if the job \<JobID\>.                                                     |
+| eacct -j \<JobID\>.\<StepID\> -l -c test.csv | Appends to the file `test.csv` all metrics in the EAR DB for each node involved in step \<StepID\> of job \<JobID\>.          |
+| eacct -j \<JobID\>.\<StepID\> -r -c test.csv | Appends to the file `test.csv` all metrics in EAR DB for each loop of each node involved in step \<StepID\> of job \<JobID\>. |
+
+When requesting long format (i.e., `-l` option) or runtime metrics (i.e., `-r` option)
+to be stored in a CSV file (i.e., `-c` option), header names change from the output
+shown when you don't request CSV format.
+Below table shows header names of CSV file storing long information about jobs:
+
+| Field name | Description |
+| ---------- | ----------- |
+| NODENAME | The node name the row information belongs to. |
+| JOBID | The JobID. |
+| STEPID | The StepID. For the sbatch step, `SLURM_BATCH_SCRIPT` value is printed. |
+| USERID |  The username of the user who executed the job. |
+| GROUPID |  The group name of the user who executed the job. |
+| JOBNAME | Job’s name or executable name if job name is not provided. |
+| USER_ACC | The account name of the user who executed the job. |
+| ENERGY_TAG | The energy tag used if the user set one for its job step. |
+| POLICY | Energy optimization policy name. *MO* means for monitoring, *ME* for min\_energy, *MT* for min\_time and *NP* is the job ran without EARL. |
+| POLICY_TH | The policy threshold used by the optimization policy set with the job. |
+| AVG\_CPUFREQ\_KHZ | Average CPU frequency of the job step executed in the node, expressed in kHz. |
+| AVG\_IMCFREQ\_KHZ | Average uncore frequency of the job step executed in the node, expressed in kHz. **Default data fabric frequency on AMD sockets**. |
+| DEF\_FREQ\_KHZ | default frequency of the job step executed in the node, expressed in kHz. |
+| TIME_SEC | Execution time (in seconds) of the application in the node. As this is computed by EARL, *sbatch* step does not contain such info. |
+| CPI | CPU Cycles per Instruction. Hint for CPU/Memory bound classification. |
+| TPI | Memory transactions per Instruction. Hint for CPU/Memory bound classification. |
+| MEM_GBS | CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification. |
+| IO_MBS | I/O (read and write) Mega Bytes per second. |
+| PERC_MPI | Percentage of MPI time over the total execution time. |
+| DC\_NODE\_POWER_W | Average node power, in Watts. |
+| DRAM\_POWER\_W | Average DRAM power, in Watts. **Not available on AMD sockets**. |
+| PCK\_POWER\_W | Average RAPL package power, in Watts. |
+| CYCLES | Total number of cycles. |
+| INSTRUCTIONS | Total number of instructions. |
+| CPU-GFLOPS | CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions. |
+| L1_MISSES | Total number of L1 cache misses. |
+| L2_MISSES | Total number of L2 cache misses. |
+| L3_MISSES | Total number of L3/LLC cache misses. |
+| SPOPS_SINGLE | Total number of single precision 64 bit floating point operations. |
+| SPOPS_128 | Total number of single precision 128 bit floating point operations. |
+| SPOPS_256 | Total number of single precision 256 bit floating point operations. |
+| SPOPS_512 | Total number of single precision 512 bit floating point operations. |
+| DPOPS_SINGLE | Total number of double precision 64 bit floating point operations. |
+| DPOPS_128 | Total number of double precision 128 bit floating point operations. |
+| DPOPS_256 | Total number of double precision 256 bit floating point operations. |
+| DPOPS_512 | Total number of double precision 512 floating point 512 operations. |
+
+If EAR supports GPU monitoring/optimisation, the following columns are added:
+
+| Field name | Description |
+| ---------- | ----------- |
+| GPU*x*\_POWER\_W | Average GPU*x* power, in Watts. |
+| GPU*x*\_FREQ\_KHZ | Average GPU*x* frequency, in kHz. |
+| GPU*x*\_MEM\_FREQ\_KHZ | Average GPu*x* memory frequency, in kHz. |
+| GPU*x*\_UTIL\_PERC | Average percentage of GPU*x* utilization. |
+| GPU*x*\_MEM\_UTIL_PERC | Average percentage of GPU*x* memory utilization. |
+
+For runtime metrics (i.e., `-r` option), *USERID*, *GROUPID*, *JOBNAME*, *USER_ACC*,
+*ENERGY_TAG* (as energy tags disable EARL), *POLICY* and *POLICY_TH* are not stored
+at the CSV file.
+However, the iteration time (in seconds) is present on each loop as *ITER_TIME_SEC*,
+as well as a timestamp (i.e., *TIMESTAMP*) with the elapsed time in seconds since the EPOCH.
+
+# EAR system energy Report (ereport)
+
+The ereport command creates reports from the energy accounting data from nodes stored in the EAR DB. It is intended to use for energy consumption analysis over a set period of time, with some additional (optional) criteria such as node name or username.
+
+```
+Usage: ereport [options]
+Options are as follows:
+        -s start_time            indicates the start of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default: end_time minus insertion time*2.
+        -e end_time              indicates the end of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default: current time.
+        -n node_name |all        indicates from which node the energy will be computed. Default: none (all nodes computed) 
+                                         'all' option shows all users individually, not aggregated.
+        -u user_name |all        requests the energy consumed by a user in the selected period of time. Default: none (all users computed). 
+                                         'all' option shows all users individually, not aggregated.
+        -t energy_tag|all        requests the energy consumed by energy tag in the selected period of time. Default: none (all tags computed). 
+                                         'all' option shows all tags individually, not aggregated.
+        -i eardbd_name|all       indicates from which eardbd (island) the energy will be computed. Default: none (all islands computed) 
+                                         'all' option shows all eardbds individually, not aggregated.
+        -g                       shows the contents of EAR's database Global_energy table. The default option will show the records for the two previous T2 periods of EARGM.
+                                         This option can only be modified with -s, not -e
+        -x                       shows the daemon events from -s to -e. If no time frame is specified, it shows the last 20 events. 
+        -v                       shows current EAR version. 
+        -h                       shows this message.
+```
+
+## Examples
+
+The following example uses the 'all' nodes option to display information for each node, as well as a start_time so it will give the accumulated energy from that moment until the current time.
+
+```
+[user@host EAR]$ ereport -n all -s 2018-09-18 
+     Energy (J)       Node      Avg. Power (W)
+     20668697         node1        146
+     20305667         node2        144
+     20435720         node3        145
+     20050422         node4        142
+     20384664         node5        144
+     20432626         node6        145
+     18029624         node7        128
+```
+
+This example filters by EARDBD host (one per island typically) instead:
+
+```
+[user@host EAR]$ ereport -s 2019-05-19 -i all
+     Energy (J)        Node     
+     9356791387        island1 
+    30475201705        island2
+    37814151095        island3 
+    28573716711        island4 
+    29700149501        island5 
+    26342209716        island6
+```
+
+And to see the state of the cluster's energy budget (set by the sysadmin) you can use the following:
+
+```
+[user@host EAR]$ ereport -g 
+Energy%  Warning lvl            Timestamp       INC th      p_state    ENERGY T1    ENERGY T2      TIME T1      TIME T2        LIMIT       POLICY
+111.486          100  2019-05-22 10:31:34            0          100          893      1011400       907200          600       604800 EnergyBudget 
+111.492          100  2019-05-22 10:21:34            0          100          859      1011456       907200          600       604800 EnergyBudget 
+111.501          100  2019-05-22 10:11:34            0          100          862      1011533       907200          600       604800 EnergyBudget 
+111.514          100  2019-05-22 10:01:34            0          100          842      1011658       907200          600       604800 EnergyBudget 
+111.532          100  2019-05-22 09:51:34            0          100          828      1011817       907200          600       604800 EnergyBudget 
+111.554            0  2019-05-22 09:41:34            0            0          837      1012019       907200          600       604800 EnergyBudget 
+```
+
+## EAR Control (econtrol)
+
+The `econtrol` command modifies cluster settings (temporally) related to power policy settings.
+These options are sent to all the nodes in the cluster.
+
+> **NOTE** Any changes done with `econtrol` will not be reflected in `ear.conf` and thus will be lost when reloading the system.
+
+```
+Usage: econtrol [options]
+        --status                                ->requests the current status for all nodes. The ones responding show the current 
+                                                        power, IP address and policy configuration. A list with the ones not
+                                                        responding is provided with their hostnames and IP address.
+                                                        --status=node_name retrieves the status of that node individually.
+        --type          [status_type]           ->specifies what type of status will be requested: hardware,
+                                                        policy, full (hardware+policy), app_node, app_master, eardbd, eargm or power. [default:hardware]
+        --power                                 ->requests the current power for the cluster. 
+                                                        --power=node_name retrieves the current power of that node individually.
+        --set-freq      [newfreq]               ->sets the frequency of all nodes to the requested one
+        --set-def-freq  [newfreq]  [pol_name]   ->sets the default frequency for the selected policy 
+        --set-max-freq  [newfreq]               ->sets the maximum frequency
+        --set-powercap  [new_cap]               ->sets the powercap of all nodes to the given value. A node can be specified
+                                                         after the value to only target said node.
+        --hosts         [hostlist]              ->sends the command only to the specified hosts. Only works with status, power_status,
+                                                         --power and --set-powercap
+        --restore-conf                          ->restores the configuration for all nodes
+        --active-only                           ->supresses inactive nodes from the output in hardware status.
+        --health-check                          ->checks all EARDs and EARDBDs for errors and prints all that are unresponsive.
+        --mail [address]                        ->sends the output of the program to address.
+        --ping                                  ->pings all nodes to check whether the nodes are up or not. Additionally,
+                                                        --ping=node_name pings that node individually.
+        --version                               ->displays current EAR version.
+        --help                                  ->displays this message.
+```
+
+`econtrol`'s status is a useful tool to monitor the nodes in a cluster. The most basic usage is the hardware status (default type) which shows basic information of all the nodes.
+
+```
+[user@login]$ econtrol --status
+hostname      power   temp    freq    job_id  stepid
+   node2        278    66C    2.59      6878       0
+   node3        274    57C    2.59      6878       0
+   node4         52    31C    1.69         0       0
+
+INACTIVE NODES
+node1   192.0.0.1
+```
+
+The application status type can be used to retrieve all currently running jobs in the cluster. `app_master` gives a summary of all the running applications while `app_node` gives detailed information of each node currently running a job.
+
+```
+[user@login]$ econtrol --status --type=app_master
+Job-Step    Nodes   DC power      CPI      GBS   Gflops     Time Avg Freq
+  6878-0        2     280.13     0.37    24.39   137.57    54.00     2.59
+
+[user@login]$ econtrol --status --type=app_node
+Node id     Job-Step   M-Rank   DC power      CPI      GBS   Gflops     Time Avg Freq
+  node2       6878-0        0     280.13     0.37    24.39   137.57    56.00     2.59
+  node3       6878-0        1     245.44     0.37    24.29   136.40    56.00     2.59
+```
+
+# Database commands
+
+## edb_create
+
+Creates the EAR DB used for accounting and for the global energy control. Requires root access to the MySQL server. It reads the `ear.conf` to get connection details (server IP and port), DB name (which may or may not have been previously created) and EAR's default users (which will be created or altered to have the necessary privileges on EAR's database).
+
+```
+Usage:edb_create [options]
+        -p       Specify the password for MySQL's root user.
+        -o       Outputs the commands that would run.
+        -r       Runs the program. If '-o' this option will be override.
+        -h       Shows this message.
+```
+
+## edb_clean_pm
+
+Cleans periodic metrics from the database. Used to reduce the size of EAR's database, it will remove every Periodic_metrics entry older than `num_days`:
+
+```
+Usage:./src/commands/edb_clean_pm [options]
+	-d num_days		REQUIRED: Specify how many days will be kept in database. (defaut: 0 days).
+	-p			Specify the password for MySQL's root user.
+	-o			Print the query instead of running it (default: off).
+	-r			Execute the query (default: on).
+	-h			Display this message.
+	-v			Show current EAR version.
+```
+
+## edb_clean_apps
+
+Removes applications from the database. It is intended to remove old applications to speed up queries and free up space. It can also be used to remove specific applications from database. It removes ALL the information related to those jobs (the following tables will be modified for each job: Loops, if they exist; GPU_signatures, if they exist; Signatures, if they exist; Power signatures, Applications, and Jobs).
+
+It is recommended to run the application with the -o option first to ensure that the queries that will be executed are correct.
+
+```
+Usage:edb_clean_apps [-j/-d] [options]
+	-p		The program will request the database user's password.
+	-u user		Database user to execute the operation (it needs DELETE privileges). [default: root]
+	-j jobid.stepid	Job id and step id to delete. If no step_id is introduced, every step within the job will be deleted
+	-d ndays	Days to preserve. It will delete any jobs older than ndays.
+	-o		Prints out the queries that would be executed. Exclusive with -r. [default:on]
+	-r		Runs the queries that would be executed. Exclusive with -o. [default:off]
+	-l		Deletes Loops and its Signatures. [default:off]
+	-a		Deletes Applications and related tables. [default:off]
+	-h		Displays this message
+```
+
+# erun
+
+`erun` is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline.
+It was designed to provide compatibility between MPI implementations not fully compatible with
+SLURM SPANK plug-in mechanism (e.g., OpenMPI), which is used to set up EAR at job submission.
+You can launch `erun` with the `--program` option to specify the application name
+and arguments. See the usage below:
+
+```
+> erun --help
+
+This is the list of ERUN parameters:
+Usage: ./erun [OPTIONS]
+
+Options:
+    --job-id=<arg>	Set the JOB_ID.
+    --nodes=<arg>	Sets the number of nodes.
+    --program=<arg>	Sets the program to run.
+    --clean		Removes the internal files.
+    
+SLURM options:
+...
+```
+
+The syntax to run an MPI application with `erun` has the form `mpirun -n <X> erun --program='my_app arg1 arg2 .. argN'`.
+Therefore, `mpirun` will run *X* erun processes.
+Then, `erun` will launch the application `my_app` with the arguments passed, if specified.
+You can use as many parameters as you want but the semicolons have to cover all
+of them in case there are more than just the program name.
+
+`erun` will simulate on the remote node both the local and remote pipelines for
+all created processes.
+It has an internal system to avoid repeating functions that are executed just one
+time per job or node, like SLURM does with its plugins.
+
+**IMPORTANT NOTE** If you are going to launch `n` applications with `erun` command through a sbatch job, you must set the environment variable `SLURM_STEP_ID` to values from `0` to `n-1` before each `mpirun` call.
+By this way `erun` will inform the EARD the correct step ID to be stored then to the Database.
+
+The `--job-id` and `--nodes` parameters create the environment variables that SLURM would have created automatically, because it is possible that your application make use of them.
+The `--clean` option removes the temporal files created to synchronize all ERUN processes.
+
+Also you have to load the EAR environment module or define its environment variables in your environment or script:
+
+| Variable                  | Parameter              |
+| ------------------------- | ---------------------- |
+| EAR_INSTALL_PATH=\<path\> | prefix=\<path\>        |
+| EAR_TMP=\<path\>          | localstatedir=\<path\> |
+| EAR_ETC=\<path\>          | sysconfdir=\<path\>    |
+| EAR_DEFAULT=\<on/off\>    | default=<on/off\>      |
\ No newline at end of file