Commit 021f904d authored by Lluis Alonso's avatar Lluis Alonso
Browse files

Initial push for EAR version 3.3

parent 021d6f4f
## Unreleased
- Removed documentation files
- Fixed when Nvidia-SMI returns bad strings.
- Added ERUN, a component to simulate the SLURM Plugin process when SLURM is not present.
- Frequency control included in eard
- eard control for node non existing in configuration
- GPU support for power monitoring. Not all cases supported yet
- Extensions to report GPU in DB
### EAR3.3 vs EAR3.2
- eacct loop basic support
- EAR loader included
- GPU support migrated to nvml API
- TAGS supported in ear.conf
- Heterogeneous clusters specification supported
- EARGM energy capping management improved
- Internal messaging protocol improved
- Average CPU frequency and Average IMC frequency computation improved
### EAR3.2
- GPU monitoring based on nvidia-smi command
- GPU power reported to the DB
- Postgress support
- Automatic GBs check in EARL
## Changed
- cluster_conf_read error fixed when reading "privileged" specification for policies
- ear.conf and ear.conf.full replaced by ear.conf.template and ear.conf.full.template
- More info with SLURM_COMP_VERBOSE env var
- ecct modififed to remove MAX_SIG_POWER and MIN_SIG_POWER
- error fixed when using short ear.conf files
- rpms folder modified to support rellocatable rpms /usr and /etc paths
- energy_nm had an error when energy_init = energy_end
- bandwith.c modified to include NULL pointers when architecture is not detected. It is pending to migrate it to a plugin
- energy_nm updated to be the same installed in lennox
- metrics folder modified by topics, msr common for rapl and temperature included in msr
- Added '--disable-avx512' flag to configure to use AVX2 symbols instead of AVX512. It is required when working with Haswell/Broadwell systems or older. Also added '--with-fortran' flag to configure to add Fortran symbols to the EAR library. It is required when working with some MPI distributions such as OpenMPI. Finally, configure accepts PostgreSQL flags.
- Three new functions in ear_api for manual utilization of EAR (requires application modification)
- Option in EARPlug to specify trace pathname changed to SLURM_EAR_TRACE_PATH.
- new trace plugin mechamism. EAR_GUI is set to on by default. SLURM_EAR_TRACE_PLUGIN env var defines trace path (no default location). Paraver plugin uses SLURM_EAR_TRACE_PATH env var. Pending to check the plugin.
- EARD INIT and RT errors reported as events to the DB: Not fully tested
- rpms3 folder included for rpm testing
- EARplug including SLURM_HACK_LIBRARY for local EARL utilization
- Support for multiple LD_PRELOAD libraries in EARplug
- Default plugins included in EAR
- EAR cpupower included
- Merge with new_policies branch: rapl plugin for energy supported
- Added SLURM_EAR_MPI_VERSION to automatic selection of ear library in EARplug
- IPMI finder removed from energy loading. Default is not supported
- power_cap and power_cap_type included in island for power_capping policies
- New policies and power models included for testing
- Merged with new policies in ear.conf
- Power policies loaded with plugins
- Power models loaded with plugins (first prototype). dir_plug included to be used with plugins
- The energy reading system now works through plugins. (5bf9dddcbe0d815cc9bd9a39ccc296cf1c29bfb9)
- eargm,earlib,daemon warnings fixed with -Wall and gcc 8
- verbose messages converted to debug or error
- error.h and debug.h included in verbose.h
- minor change fixed in wait_for_cient function. Argument for accept was incorrect.
- eard_api non-blocking calls
- increased MAX_TRIES for non-blocking calls
- NO_COMMAND set when message failure and non-blocking calls are used
- eargmd. Warning was not initialized for NO_PROBLEM case
- Working in a dynamic loading for power/energy policies
- Cleaned COORDINATE_FREQUENCIES, MEASURE_DYNAIS_OV, EAR_PERFORMANCE_TESTS and IN_MPI_TIME.
- Cleaned dynamic policies.
- Deleted unused files.
### Changed
- new IPMI interface thread-safe. Each EARD thread creates a new energy_handler_t
- lock to avoid simultaneous ipmi access
- assert removed from ipmi functions. replaced by condition+error message
- eard_rapi new_job and end_job calls are now non-blocking
- support for dynamic management of multiple contexts
- earl: error.h included in ear_api.c. It was generating a crash when invalid step_id is used
- earl: master lock files removed when invalid step id is used
- eardbd_api error when using global variables. eardbd_api is not thread-safe. Two variables moved from global to local and loc k included in sockets_send for atomic sent
- eardbd host name included in aggregated metrics
- eargm reporting gloabl status every T1 period
- new ereport with options -i (filter by island) and -g (show global manager records)
- ereport now has a filter for eadbds/islands
- ereport now has an option to report Global_energy records
- ereport -i "all" option now also reports avg power
- ereport -g now can be used in conjunction with -s
- new eacct -x option to see EAR events
- eacct does not filter applications with high and low power values anymore
- fixed an error where edb_create would not output a correct user creation query
- freeipmi dependence removed
This diff is collapsed.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation and/
or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Eclipse Public License, Version 1.0 (EPL-1.0)
THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE PUBLIC
LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM
CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.
1. DEFINITIONS
"Contribution" means:
1. a) in the case of the initial Contributor, the initial code and
documentation distributed under this Agreement, and
b) in the case of each subsequent Contributor:
i) changes to the Program, and
ii) additions to the Program;
where such changes and/or additions to the Program originate from and
are distributed by that particular Contributor. A Contribution
'originates' from a Contributor if it was added to the Program by such
Contributor itself or anyone acting on such Contributor's behalf.
Contributions do not include additions to the Program which: (i) are
separate modules of software distributed in conjunction with the
Program under their own license agreement, and (ii) are not derivative
works of the Program.
"Contributor" means any person or entity that distributes the Program.
"Licensed Patents" mean patent claims licensable by a Contributor which are
necessarily infringed by the use or sale of its Contribution alone or when
combined with the Program.
"Program" means the Contributions distributed in accordance with this
Agreement.
"Recipient" means anyone who receives the Program under this Agreement,
including all Contributors.
2. GRANT OF RIGHTS
1. a) Subject to the terms of this Agreement, each Contributor hereby
grants Recipient a non-exclusive, worldwide, royalty-free copyright
license to reproduce, prepare derivative works of, publicly display,
publicly perform, distribute and sublicense the Contribution of such
Contributor, if any, and such derivative works, in source code and
object code form.
b) Subject to the terms of this Agreement, each Contributor hereby
grants Recipient a non-exclusive, worldwide, royalty-free patent
license under Licensed Patents to make, use, sell, offer to sell,
import and otherwise transfer the Contribution of such Contributor, if
any, in source code and object code form. This patent license shall
apply to the combination of the Contribution and the Program if, at
the time the Contribution is added by the Contributor, such addition
of the Contribution causes such combination to be covered by the
Licensed Patents. The patent license shall not apply to any other
combinations which include the Contribution. No hardware per se is
licensed hereunder.
c) Recipient understands that although each Contributor grants the
licenses to its Contributions set forth herein, no assurances are
provided by any Contributor that the Program does not infringe the
patent or other intellectual property rights of any other entity. Each
Contributor disclaims any liability to Recipient for claims brought by
any other entity based on infringement of intellectual property rights
or otherwise. As a condition to exercising the rights and licenses
granted hereunder, each Recipient hereby assumes sole responsibility
to secure any other intellectual property rights needed, if any. For
example, if a third party patent license is required to allow
Recipient to distribute the Program, it is Recipient's responsibility
to acquire that license before distributing the Program.
d) Each Contributor represents that to its knowledge it has
sufficient copyright rights in its Contribution, if any, to grant the
copyright license set forth in this Agreement.
3. REQUIREMENTS
A Contributor may choose to distribute the Program in object code form under
its own license agreement, provided that:
1. a) it complies with the terms and conditions of this Agreement; and
b) its license agreement:
i) effectively disclaims on behalf of all Contributors all warranties
and conditions, express and implied, including warranties or
conditions of title and non-infringement, and implied warranties or
conditions of merchantability and fitness for a particular purpose;
ii) effectively excludes on behalf of all Contributors all liability
for damages, including direct, indirect, special, incidental and
consequential damages, such as lost profits;
iii) states that any provisions which differ from this Agreement are
offered by that Contributor alone and not by any other party; and
iv) states that source code for the Program is available from such
Contributor, and informs licensees how to obtain it in a reasonable
manner on or through a medium customarily used for software exchange.
When the Program is made available in source code form:
1. a) it must be made available under this Agreement; and
b) a copy of this Agreement must be included with each copy of the
Program.
Contributors may not remove or alter any copyright notices contained within
the Program.
Each Contributor must identify itself as the originator of its Contribution,
if any, in a manner that reasonably allows subsequent Recipients to identify
the originator of the Contribution.
4. COMMERCIAL DISTRIBUTION
Commercial distributors of software may accept certain responsibilities with
respect to end users, business partners and the like. While this license is
intended to facilitate the commercial use of the Program, the Contributor who
includes the Program in a commercial product offering should do so in a
manner which does not create potential liability for other Contributors.
Therefore, if a Contributor includes the Program in a commercial product
offering, such Contributor ("Commercial Contributor") hereby agrees to defend
and indemnify every other Contributor ("Indemnified Contributor") against any
losses, damages and costs (collectively "Losses") arising from claims,
lawsuits and other legal actions brought by a third party against the
Indemnified Contributor to the extent caused by the acts or omissions of such
Commercial Contributor in connection with its distribution of the Program in
a commercial product offering. The obligations in this section do not apply
to any claims or Losses relating to any actual or alleged intellectual
property infringement. In order to qualify, an Indemnified Contributor must:
a) promptly notify the Commercial Contributor in writing of such claim, and
b) allow the Commercial Contributor to control, and cooperate with the
Commercial Contributor in, the defense and any related settlement
negotiations. The Indemnified Contributor may participate in any such claim
at its own expense.
For example, a Contributor might include the Program in a commercial product
offering, Product X. That Contributor is then a Commercial Contributor. If
that Commercial Contributor then makes performance claims, or offers
warranties related to Product X, those performance claims and warranties are
such Commercial Contributor's responsibility alone. Under this section, the
Commercial Contributor would have to defend claims against the other
Contributors related to those performance claims and warranties, and if a
court requires any other Contributor to pay any damages as a result, the
Commercial Contributor must pay those damages.
5. NO WARRANTY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON
AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER
EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR
CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the
appropriateness of using and distributing the Program and assumes all risks
associated with its exercise of rights under this Agreement , including but
not limited to the risks and costs of program errors, compliance with
applicable laws, damage to or loss of data, programs or equipment, and
unavailability or interruption of operations.
6. DISCLAIMER OF LIABILITY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY
CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION
LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE
EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES.
7. GENERAL
If any provision of this Agreement is invalid or unenforceable under
applicable law, it shall not affect the validity or enforceability of the
remainder of the terms of this Agreement, and without further action by the
parties hereto, such provision shall be reformed to the minimum extent
necessary to make such provision valid and enforceable.
If Recipient institutes patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Program itself (
excluding combinations of the Program with other software or hardware)
infringes such Recipient's patent(s), then such Recipient's rights granted
under Section 2(b) shall terminate as of the date such litigation is filed.
All Recipient's rights under this Agreement shall terminate if it fails to
comply with any of the material terms or conditions of this Agreement and
does not cure such failure in a reasonable period of time after becoming
aware of such noncompliance. If all Recipient's rights under this Agreement
terminate, Recipient agrees to cease use and distribution of the Program as
soon as reasonably practicable. However, Recipient's obligations under this
Agreement and any licenses granted by Recipient relating to the Program shall
continue and survive.
Everyone is permitted to copy and distribute copies of this Agreement, but in
order to avoid inconsistency the Agreement is copyrighted and may only be
modified in the following manner. The Agreement Steward reserves the right to
publish new versions (including revisions) of this Agreement from time to
time. No one other than the Agreement Steward has the right to modify this
Agreement. The Eclipse Foundation is the initial Agreement Steward. The
Eclipse Foundation may assign the responsibility to serve as the Agreement
Steward to a suitable separate entity. Each new version of the Agreement will
be given a distinguishing version number. The Program (including
Contributions) may always be distributed subject to the version of the
Agreement under which it was received. In addition, after a new version of
the Agreement is published, Contributor may elect to distribute the Program (
including its Contributions) under the new version. Except as expressly
stated in Sections 2(a) and 2(b) above, Recipient receives no rights or
licenses to the intellectual property of any Contributor under this
Agreement, whether expressly, by implication, estoppel or otherwise. All
rights in the Program not expressly granted under this Agreement are reserved.
This Agreement is governed by the laws of the State of New York and the
intellectual property laws of the United States of America. No party to this
Agreement will bring a legal action under this Agreement more than one year
after the cause of action arose. Each party waives its rights to a jury trial
in any resulting litigation.
export CC
export CC_FLAGS
export MPICC
export MPICC_CFLAGS
export MPI_BASE
export MPI_CFLAGS
export MPI_VERSION
export PAPI_BASE
export PAPI_CFLAGS
export PAPI_LDFLAGS
export GSL_BASE
export GSL_CFLAGS
export GSL_LDFLAGS
export SLURM_BASE
export SLURM_CFLAGS
export DB_BASE
export DB_CFLAGS
export DB_LDFLAGS
export CUDA_BASE
export ROOTDIR
export SRCDIR
export DESTDIR
export ETCDIR
export TMPDIR
export DOCDIR
export FEAT_AVX512
export VER_MAJOR
export VER_MINOR
export CHOWN_USR
export CHOWN_GRP
export CONSTANTS
export REPLACE
......@@ -2,13 +2,9 @@ CC = @CC@
CC_FLAGS = @CC_FLAGS@
MPICC = @MPICC@
MPICC_FLAGS = @MPICC_FLAGS@
MPI_BASE = @MPI_DIR@
MPI_CFLAGS = @MPI_CPPFLAGS@
MPI_VERSION = @MPI_VERSION@
ROOTDIR = $(shell pwd)
SRCDIR = $(ROOTDIR)/src
DESTDIR = @prefix@
ETCDIR = @sysconfdir@
TMPDIR = @localstatedir@
DOCDIR = @docdir@
PAPI_BASE = @PAPI_DIR@
PAPI_CFLAGS = @PAPI_CPPFLAGS@
PAPI_LDFLAGS = @PAPI_LDFLAGS@ @PAPI_LIBS@
......@@ -20,46 +16,24 @@ SLURM_CFLAGS = @SLURM_CPPFLAGS@
DB_BASE = @DB_DIR@
DB_CFLAGS = @DB_CPPFLAGS@
DB_LDFLAGS = @DB_LDFLAGS@ @DB_LIBS@
CUDA_BASE = @CUDA_DIR@
ROOTDIR = $(shell pwd)
SRCDIR = $(ROOTDIR)/src
DESTDIR = @prefix@
ETCDIR = @sysconfdir@
TMPDIR = @localstatedir@
DOCDIR = @docdir@
FEAT_AVX512 = @FEAT_AVX512@
CHOWN_USR = @USER@
CHOWN_GRP = @GROUP@
CONSTANTS = -DSEC_KEY=10001
REPLACE =
FEAT_AVX512 = @FEAT_AVX512@
FEAT_FORT = @FEAT_FORT@
######## VARS
export CC
export CC_FLAGS
export MPICC
export MPICC_FLAGS
export MPI_VERSION
export ROOTDIR
export SRCDIR
export DESTDIR
export ETCDIR
export TMPDIR
export DOCDIR
export PAPI_BASE
export PAPI_CFLAGS
export PAPI_LDFLAGS
export GSL_BASE
export GSL_CFLAGS
export GSL_LDFLAGS
export SLURM_BASE
export SLURM_CFLAGS
export DB_BASE
export DB_CFLAGS
export DB_LDFLAGS
export CHOWN_USR
export CHOWN_GRP
export CONSTANTS
export REPLACE
export FEAT_AVX512
export FEAT_FORT
export VER_MAJOR
export VER_MINOR
######## EXPORTS
include ./Makefile.exports
######## RULES
......
# Energy Aware Runtime version 3.2
# Energy Aware Runtime version 3.3
<img src="etc/images/logo.png" align="right" width="440">
Energy Aware Runtime (EAR) package provides an energy management framework for super computers. EAR contains different components, all together provide three main services:
1) A **easy-to-use and lightweight optimizarion service** to automatically select the optimal CPU frequency according to the application and the node characteristics. This service is provided by two components: the EAR library (**EARL**) and the EAR daemon (**EARD**). EARL is a smart component which is loaded next to the application, intercepting MPI calls and selecting the CPU frequency based on the application behaviour on the fly. The library is loaded automatically through the EAR SLURM plugin (**EARPLUG, earplug.so**).
2) A complete **energy and performance accounting and monitoring system** based on SQL database (MariaDB and PostgreSQL are supported). The energy accounting system is configurable in terms of application details and update frequency. The EAR database daemon (**EARDBD**) is used to cache those metrics prior to DB insertions.
3) A **global energy management** to monitor and control the energy consumed in the system through the EAR global manager daemon (**EARGMD**). This control is configurable, it can dynamically adapt policy settings based on global energy limits or just offer global cluster monitoring.
<img src="etc/images/logo.png" align="right" width="440"> Energy Aware Runtime (EAR) package provides monitoring and energy saving solutions for super computers based on MPI and SLURM. Please visit [the wiki page](https://gitlab.bsc.es/ear_team/ear/-/wikis/home) for a detailed installation, configuration and user guides.
License
-------
All the files in the EAR framework are under the LGPLv2.1 license. See the [COPYING](../../COPYING) file in the EAR root directory.
EAR is a open source software and it is licensed under both the BSD-3 license for individual/non-commercial
use and EPL-1.0 license for commercial use. Full text of both licenses can be
found in COPYING.BSD and COPYING.EPL files.
Contact: [ear-support@bsc.es](mailto:ear-support@bsc.es)
......@@ -7,11 +7,13 @@ m4_include([m4/x_ac_pgsql.m4])
m4_include([m4/x_ac_mysql.m4])
m4_include([m4/x_ac_slurm.m4])
m4_include([m4/x_ac_papi.m4])
m4_include([m4/x_ac_cuda.m4])
m4_include([m4/x_ac_gsl.m4])
# m4_include([m4/x_ac_mpi.m4])
# INIT
AC_PREREQ([2.69])
AC_INIT([EAR], [3.2])
AC_INIT([EAR], [3.3])
AC_LANG(C)
# PROGRAMS TEST
......@@ -110,6 +112,21 @@ X_AC_PGSQL
X_AC_MYSQL
#########
# CUDA #
#########
#
X_AC_CUDA
#########
# MPI #
#########
#
# X_AC_MPI
############
# FEATURES #
############
......
......@@ -12900,7 +12900,7 @@ Darkula color scheme from the JetBrains family of IDEs
</ol>
<h2 id="execution-and-checks">Execution and checks</h2>
<ol>
<li>Start EARDs and EARDBDs via services (see our <a href="#13--Execution">Launching the components with unit services</a>). EARDBD and EARD outputs can be found at ´$EAR_TMP/eardbd.log´ and
<li>Start EARDs and EARDBDs via services (see our <a href="#3--Executionlaunching-the-components-through-unit-services">Launching the components with unit services</a>). EARDBD and EARD outputs can be found at ´$EAR_TMP/eardbd.log´ and
´$EAR_TMP/eard.log´ respectivelly when DBDaemonUseLog and NodeUseLog options are set to 1 in ear.conf file. Otherwise, their outputs are generated in stderr and can be seen using the journactl command. For instance, use ´journactl
-u eard´ to look at eard output.</li>
<li>Check that the EARDs are up and running correctly with <code>econtrol --status</code> (note that the daemons will take around a minute to correctly report energy and not show up as an error in <code>econtrol</code>). EARDs creates
......@@ -13614,6 +13614,7 @@ Darkula color scheme from the JetBrains family of IDEs
</ul>
</li>
</ul>
<p>Please visit the <a href="#READMEislands">islands example</a> for more information and examples of a cluster configuration in form of islands.</p>
<h2 id="slurm-spank-plugin-configuration-file">SLURM spank plugin configuration file</h2>
<p>SLURM loads the plugin through a file called <code>plugstack.conf</code>, which is composed by a list of a plugins. In the file <code>etc/slurm/ear.plugstack.conf</code>, there is an example entry with the paths already set to the plugin,
temporal and configuration paths.</p>
......@@ -13903,7 +13904,7 @@ Energy% Warning lvl Timestamp INC th p_state ENERGY T1
<p class="page" id="17--Learning-phase"></p>
<p>This is a necessary phase prior to the normal EAR utilization and is a kind of hardware characterization of the nodes. During the phase a matrix of coefficients are calculated and stored. These coefficients will be used to predict the
energy consumption and performance of each application.</p>
<p>Please, visit the learning phase <a href="https://gitlab.bsc.es/ear_team/ear_learning/-/wikis/home">wiki page</a> to read the manual and the <a href="https://gitlab.bsc.es/ear_team/ear_learning">repository</a> to get the scripts and the kernels.</p>
<p>Please, visit the learning phase <a href="https://github.com/BarcelonaSupercomputingCenter/ear_learning/wiki">wiki page</a> to read the manual and the <a href="https://github.com/BarcelonaSupercomputingCenter/ear_learning">repository</a> to get the scripts and the kernels.</p>
<p class="page" id="18--Plugins"></p>
<h1 id="ear-plugins">EAR plugins</h1>
<p>Some of the core EAR functionality can be dynamically loaded through a plugin mechanism, making EAR more extensible and dynamic than previous version since it is not needed to reinstall the system to add , for instance, a new policy or
......@@ -14187,6 +14188,21 @@ NodeDaemonPowermonFreq
SupportedPolicies
M<span class="hljs-keyword">in</span>TimePerformanceAccuracy</pre>
<p>To reconfigure other options such as EARD connection port, coefficients, etc, it must be stopped and restarted again.</p>
<h2 id="api">API</h2>
<p>The (node) Daemon offers a simple API to request changes on the frequency, modify the current node settings, and reload the system configuration by reading <code>$(EAR_ETC)/ear/ear.conf</code></p>
<p>Three APIs are provided:</p>
<ul>
<li>
<p>Local API, to be used by <a href="#34--EARL">EARL</a> (or any other runtime). It can be found in <a href="#eard_apih">eard_api.h</a>. This API involves complex data types and is not public.</p>
</li>
<li>
<p>Local API, to be used by applications. It is a subset of the EARD api and designed to be used by any applications to contact the privileged metric service offered by EARD. This API is public and can be used without restrictions,
therefore it does not include functions to change the frequency. It can be found at <a href="#">TBD</a>.</p>
</li>
<li>
<p>Remote API, to be used by the <a href="#33--EARGM">EARGMD</a> or system commands and tools such as the <code>econtrol</code>. Can be found at <a href="#eard_rapih">eard_rapi.h</a> and is not public.</p>
</li>
</ul>
<p class="page" id="32--EARDBD"></p>
<h2 id="eardbd-database-manager">EARDBD: Database Manager</h2>
<p>EARDBD caches the records generated by the <a href="#34--EARL">EARL</a> and <a href="#31--EARD">EARD</a> in the system and reports it to the centralized database. It is recommended to run several EARDBDs if the cluster is big enough, to
......@@ -14425,4 +14441,4 @@ SLURM_EAR_TRACE_PATH=TRACES_PARAVER/</pre>
})
</script>
 
</html>
</html>
\ No newline at end of file
This diff is collapsed.
.\" Manpage for EAR.
.TH man 8 "28 October 2020" "3.3" "EAR man page"
.SH NAME
EAR \- EAR Framework.
.SH DESCRIPTION
Energy Aware Runtime (EAR) package provides an energy management framework for super computers. EAR contains different components, all together provide three main services:
1- A easy-to-use and lightweight optimizarion service to automatically select the optimal CPU frequency according to the application and the node characteristics. This service is provided by two components: the EAR library (EARL) and the EAR daemon (EARD). EARL is a smart component which is loaded next to the application, intercepting MPI calls and selecting the CPU frequency based on the application behaviour on the fly. The library is loaded automatically through the EAR SLURM plugin (EARPLUG, earplug.so).
2- A complete energy and performance accounting and monitoring system based on SQL database (MariaDB and PostgreSQL are supported). The energy accounting system is configurable in terms of application details and update frequency. The EAR database daemon (EARDBD) is used to cache those metrics prior to DB insertions.
3- A global energy management to monitor and control the energy consumed in the system through the EAR global manager daemon (EARGMD). This control is configurable, it can dynamically adapt policy settings based on global energy limits or just offer global cluster monitoring.
.SH SEE ALSO
eacct(1), econtrol(1), ereport(1)
.SH AUTHOR
EAR support team (ear-support@bsc.es)
.\" Manpage for eacct.
.TH man 1 "26 October 2018" "1.1" "eacct man page"
.TH man 1 "28 October 2020" "3.3" "eacct man page"
.SH NAME
eacct \- See a report of the last job executions reported by EAR's daemons.
.SH SYNOPSIS
......@@ -11,19 +11,20 @@ eacct is a simple command to see a jobs' energy accounting information. It can a
.SH OPTIONS
-v verbose mode for debugging purposes
-h displays this message
-v displays current EAR version
-b verbose mode for debugging purposes
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
-c specifies the file where the output will be stored in CSV format. [default: no file]
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
-l shows the information for each node for each job instead of the global statistics for said job.
-x shows the last EAR events. Nodes, job ids, and step ids can be specified as if were showing job information.
-r shows the EAR loop signatures. Nodes, job ids, and step ids can be specified as if were showing job information.
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
-h displays usage options
.SH Usage examples
Job 31191 corresponds with the execution of the bqcd application with 5 job steps. When executing eacct -j 31191 we will get the following output:
......@@ -62,5 +63,7 @@ The following instead retrieves the first EARL events for the previous job:
.SH BUGS
- Saving the output to a file with the -c option and then reading it with -f may cause some issues if there are empty fields.
.SH SEE ALSO
ereport(1), econtrol(1), EAR(8)
.SH AUTHOR
Lluís Alonso (lluis.alonso@bsc.es)
EAR support team (ear-support@bsc.es)