|
|
## Tables
|
|
|
|
|
|
### Application information
|
|
|
|
|
|
The following tables contain information directly related to applications executed on the system while EAR was monitoring. The main key is the JOBID.STEPID combination generated by the scheduler.
|
|
|
|
|
|
- **Jobs**: job information (app_id, user_id, job_id, step_id, etc). One record per JOBID.STEPID is created in the DB.
|
|
|
- **Applications**: this table's records serve as a link between Jobs and Signatures, providing an application signature (from EARL) for each node of a job. One record per JOBID.STEPID.NODENAME is created in the DB.
|
|
|
- **Loops**: similar to _Applications_, but stores a Signature for each application loop detected by EARL, instead of one per each application. This table provides internal details of running applications and could significantly increase the DB size.
|
|
|
- **Signatures**: EARL computed signature and metrics. One record per JOBID.STEPID.NODENAME is created in the DB when the application is executed with EARL.
|
|
|
- **GPU_signatures**: EARL computed GPU signatures. This information belongs to a loop or application signature. If the signature is from a node with 4 GPUs there will be 4 records.
|
|
|
- **Power_signatures**: Basic time and power metrics that can be obtained without EARL. Reported for all applications. One record per JOBID.STEPID.NODENAME is created in the DB.
|
|
|
|
|
|
### System monitoring
|
|
|
|
|
|
This tables contain periodic information gathered from the nodes. There is a single-node information table and an aggregated one to increase the speed of queries to get cluster-wide information.
|
|
|
|
|
|
- **Periodic_metrics**: node metrics reported every N seconds (N is defined in `ear.conf`).
|
|
|
- **Periodic_aggregations**: sum of all _Periodic_metrics_ in a time period to ease accounting in `ereport` command and EARGM, as well as reducing database size (_Periodic_metrics_ of older periods where precision at node level is not needed can be deleted and the aggregations can be used instead).
|
|
|
|
|
|
### Events
|
|
|
|
|
|
- **Events**: EAR events report. There are several types of events, depending on their source: EARL, EARD-powercap, EARD-runtime and EARGM. For more information, see the [table's fields](EAR-database-table-descriptions#events) and its header file (src/common/types/event_type.h). For EARL-specific events, also see [this](EAR-environment-variables#report_earl_events).
|
|
|
|
|
|
### EARGM reports
|
|
|
|
|
|
- **Global_energy**: contains reports of cluster-wide energy accounting set by EARGM using the parameters in `ear.conf`. One record every T1 period (defined at ear.conf) is reported.
|
|
|
|
|
|
### Learning phase
|
|
|
|
|
|
This tables are the same as their non-learning counterparts, but are specifically used to store the applications executed during a learning phase.
|
|
|
|
|
|
- **Learning_applications**: same as _Applications_, restricted to learning phase applications.
|
|
|
- **Learning_jobs**: same as _Jobs_, restricted to learning phase jobs.
|
|
|
- **Learning_signatures**: same as _Signatures_, restricted to learning phase job metrics.
|
|
|
|
|
|
> **NOTE** In order to have _GPU_signatures_ table created and _Periodic_metrics_ containing GPU data, the databasease must be created (if you follow the `edb_create` approach, see the section down below) with GPUs enabled at the compilation time. See [how to update from previous versions](#updating-from-previous-versions) if you are updating EAR from a release not having GPU metrics.
|
|
|
|
|
|
## Creation and maintenance
|
|
|
|
|
|
To create the database a command (`edb_create`) is provided by EAR, which can either create the database directly or provide the queries for the database creation so the administrator can use them or modify them at their discretion (any changes may alter the correct function of EAR's accounting).
|
|
|
|
|
|
Since a lot of data is reported by EAR to the database, EAR provides two commands to remove old data and free up space. These are intended to be used with a `cron` job or a similar tool, but they can also be run manually without any issues. The two tools are `edb_clean_pm` to remove periodic data accounting from nodes, and `edb_clean_apps` to remove all the data related to old jobs.
|
|
|
|
|
|
For more information on this commands, check the [commands' page on the wiki](EAR-commands#database-commands)
|
|
|
|
|
|
## Database creation and `ear.conf`
|
|
|
|
|
|
When running `edb_create` some tables might not be created, or may have some quirks, depending on some `ear.conf` settings. The settings and alterations are as follows:
|
|
|
|
|
|
- `DBReportNodeDetail`: if set to 1, `edb_create` will create two additional columns in the _Periodic_metrics_ table for Temperature (in Celsius) and Frequency (in Hz) accounting.
|
|
|
- `DBReportSigDetail`: if set to 1, _Signatures_ will have additional fields for cycles, instructions, and FLOPS1-8 counters (number of instruction by type).
|
|
|
- `DBMaxConnections`: this will restrict the number of maximum simultaneous commands connections.
|
|
|
|
|
|
If any of the settings is set to 0, the table will have fewer details but the table's records will be smaller in stored size.
|
|
|
|
|
|
Any table with missing columns can be later altered by the admin to include said columns. For a full detail of each table's columns, run `edb_create -o` with the desired `ear.conf` settings.
|
|
|
|
|
|
## Information reported and `ear.conf`
|
|
|
|
|
|
There are various settings in `ear.conf` that restrict data reported to the database and some errors might occur if the database configuration is different from EARDB's.
|
|
|
|
|
|
- `DBReportNodeDetail`: if set to 1, node managers will report temperature, average frequency, DRAM and PCK energy to the database manager, which will try to insert it to _Periodic_metrics_. If _Periodic_metrics_ does not have the columns for both metrics, an error will occur and nothing will be inserted. To solve the error, set `ReportNodeDetail` to 0 or manually update _Periodic_metrics_ in order to have the necessary columns.
|
|
|
- `DBReportSigDetail`: similarly to `ReportNodeDetail`, an error will occur if the configuration differs from the one used when creating the database.
|
|
|
- `DBReportLoops` : if set to 1, EARL detected application loops will be reported to the database, each with its corresponding Signature. Set to 0 to disable this feature. Regardless of the setting, no error should occur.
|
|
|
|
|
|
If _Signatures_ and/or _Periodic_metrics_ have additional columns but their respective settings are set to 0, a NULL will be set in those additional columns, which will make those rows smaller in size (but bigger than if the columns did not exist).
|
|
|
|
|
|
Additionally, if EAR was compiled in a system with GPUs (or with the GPU flag manually enabled), another table to store GPU data will be created.
|
|
|
|
|
|
![](./images/GPU_DB_diagram.png)
|
|
|
|
|
|
> **NOTE** the nomenclature is modified from MySQL's type. Any type starting with `u` is unsigned. `bigint` corresponds to an integer of 64 bits, `int` is 32 and `smallint` is 16.
|
|
|
>
|
|
|
> For a detailed description of each field in any of the database's tables, see [here](EAR-database-table-descriptions).
|
|
|
|
|
|
## Updating from previous versions
|
|
|
|
|
|
### From EAR 4.1 to 4.2
|
|
|
|
|
|
A field in the Events table had its name changed to be more generic. One can do that with EITHER of the following commands:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Events RENAME COLUMN freq TO value;
|
|
|
```
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Events CHANGE freq value INT unsigned;
|
|
|
```
|
|
|
|
|
|
Furthermore, some errors on big servers have been found due to the ids of a few fields being too small. To correct this, please run the following commands:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Learning_signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
|
|
|
ALTER TABLE Signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
|
|
|
ALTER TABLE Applications MODIFY COLUMN signature_id BIGINT unsigned;
|
|
|
ALTER TABLE Loops MODIFY COLUMN signature_id BIGINT unsigned;
|
|
|
```
|
|
|
|
|
|
If GPUs are being used, also run:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE GPU_signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
|
|
|
ALTER TABLE Learning_signatures MODIFY COLUMN min_gpu_sig_id BIGINT unsigned;
|
|
|
ALTER TABLE Learning_signatures MODIFY COLUMN max_gpu_sig_id BIGINT unsigned;
|
|
|
ALTER TABLE Signatures MODIFY COLUMN min_gpu_sig_id BIGINT unsigned;
|
|
|
ALTER TABLE Signatures MODIFY COLUMN max_gpu_sig_id BIGINT unsigned;
|
|
|
```
|
|
|
|
|
|
### From EAR 3.4 to 4.0
|
|
|
|
|
|
Several fields have to be added in this update. To do so, run the following commands to the database's CLI client:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Signatures ADD COLUMN avg_imc_f INT unsigned AFTER avg_f;
|
|
|
ALTER TABLE Signatures ADD COLUMN perc_MPI FLOAT AFTER time;
|
|
|
ALTER TABLE Signatures ADD COLUMN IO_MBS FLOAT AFTER GBS;
|
|
|
|
|
|
ALTER TABLE Learning_signatures ADD COLUMN avg_imc_f INT unsigned AFTER avg_f;
|
|
|
ALTER TABLE Learning_signatures ADD COLUMN perc_MPI FLOAT AFTER time;
|
|
|
ALTER TABLE Learning_signatures ADD COLUMN IO_MBS FLOAT AFTER GBS;
|
|
|
```
|
|
|
|
|
|
### From EAR 3.3 to 3.4
|
|
|
|
|
|
If no GPUs were used and they will not be used there are no changes necessary.
|
|
|
|
|
|
If GPUs were being used, type the following commands to the database's CLI client:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Signatures ADD COLUMN min_GPU_sig_id BIGINT unsigned, ADD COLUMN max_GPU_sig_id BIGINT unsigned;
|
|
|
ALTER TABLE Learning_signatures ADD COLUMN min_GPU_sig_id BIGINT unsigned, ADD COLUMN max_GPU_sig_id BIGINT unsigned;
|
|
|
CREATE TABLE IF NOT EXISTS GPU_signatures ( id BIGINT unsigned NOT NULL AUTO_INCREMENT, GPU_power FLOAT NOT NULL, GPU_freq INT unsigned NOT NULL, GPU_mem_freq INT unsigned NOT NULL, GPU_util INT unsigned NOT NULL, GPU_mem_util INT unsigned NOT NULL, PRIMARY KEY (id));
|
|
|
```
|
|
|
|
|
|
If no GPUs were being used but now are present, use the previous query plus the following one:
|
|
|
|
|
|
```
|
|
|
ALTER TABLE Periodic_metrics ADD COLUMN GPU_energy INT;
|
|
|
``` |
|
|
\ No newline at end of file |