|
|
# Energy Data Center Monitor
|
|
|
|
|
|
The Energy Data Center Monitor is a new EAR service for Data Center monitoring. In particular, it targets elements different than computational nodes which are already monitored by the EARD running in compute nodes. However, whereas the EARDs monitor (among others) DC node power, the EDCMON service targets (eventhough it's not limited to) AC power. Because of that reason, the EDCMON main goal is to include all the power consumer components in a Data Center (Compute nodes, Network, Storage, Management).
|
|
|
|
|
|
EDCMON is 100% configurable and extensible since it uses an EAR framework named Plugin Manager which allows to load as many plugins as needed, which specific frequencies , dependencies among them and to share data between them. These plugins can communicate with each other through a **tag** (naming) system. The tag is a free text specified in the plugin code and is used as reference to specify dependencies, data sharing etc.
|
|
|
|
|
|
|
|
|
## The EDCMON executable
|
|
|
|
|
|
EDCMON parameters are:
|
|
|
|
|
|
```
|
|
|
Usage: ./edcmon [OPTIONS]
|
|
|
|
|
|
Options:
|
|
|
--plugins List of comma separated plugins to load.
|
|
|
--paths List of comma separated priority paths to search plugins.
|
|
|
--verbose Show how the things are going internally.
|
|
|
--silence Hide messages returned by plugins.
|
|
|
--monitor Period at which the plugin wake ups for monitoring. Def=100 ms
|
|
|
--relax Period to be used during low monitoring periods. Def=100 ms
|
|
|
--help If you see it you already typed --help.
|
|
|
```
|
|
|
|
|
|
This is an example of the executable arguments and its format:
|
|
|
|
|
|
```
|
|
|
edcmon --monitor=1000 --relax=1000 --plugins=nodesensors.so:30000+nodesensors_report.so:30000:nodesensor_log+nodesensors_alerts.so:30000:log --verbose
|
|
|
```
|
|
|
|
|
|
This example shows the default configuration used by EAR when the edcmon service is deployed. This case configures a monitoring period of 1 second and it loads three plugins (separated by character +):
|
|
|
|
|
|
- nodesensors: monitoring plugin based on Lenovo Confluent software. Reads specified sensors every 30 seconds. Exposes "nodesensors" tag.
|
|
|
- nodesensors_report: a reporting plugin for nodesensors plugin. Depends on "nodesensors" tag and uses its data. It is executed every 30 secs. The "nodesensor_log" is a parameter indicating the report plugin to use. Exposes "nodesensors_report" tag.
|
|
|
- nodesensors_alerts: An alerting plugin depending on "nodesensors" tag and using the data produced by it. Executed every 30 secs. Exposes the "nodesensors_alerts" tag. The argument "log" indicates the approach to report alerts.
|
|
|
|
|
|
> Plugins are installed in $EAR_INSTALL_PATH/lib/plugins/monitoring folder
|
|
|
|
|
|
```
|
|
|
./edcemon --plugins=metrics.so:2000+periodic_metrics.so:4000 --paths=path/to/plugins1:path/to/plugins2
|
|
|
```
|
|
|
|
|
|
**The list of plugins to load** contains also their calling time in milliseconds. Its main periodic action (PA) function will be called once that time has passed. But that variable is not mandatory, because some plugins may not have defined a PA function and only act as receiver of other plugin data. In that case these receiving functions will be called once the shared data of other plugin is ready. Or maybe you don't want Plugin Manager to call your PA function in that moment.
|
|
|
|
|
|
Also, additional colons can be provided to pass information to a plugin during its initialization:
|
|
|
|
|
|
```
|
|
|
./edcmon --plugins=metrics.so:2000+periodic_metrics.so:4000:config_message1:config_message2 --paths=path/to/plugins1:path/to/plugins2
|
|
|
```
|
|
|
|
|
|
You can send N configuration messages to your plugin initialization function which will alter its behaviour. You can avoid the time variable or write 0 instead:
|
|
|
|
|
|
```
|
|
|
./edcmon --plugins=metrics.so:2000+periodic_metrics.so:0000:config_message1:config_message2 --paths=path/to/plugins1:path/to/plugins2
|
|
|
|
|
|
./edcmon --plugins=metrics.so:2000+periodic_metrics.so:config_message1:config_message2 --paths=path/to/plugins1:path/to/plugins2
|
|
|
|
|
|
```
|
|
|
|
|
|
Plugins also have **dependencies**. It means that a plugin may depend on the actions or data shared by other plugins. A dependency is written in a string in the compiled binary itself, so you don't have to load it manually. It will be loaded automatically and its calling time could be the dependent plugin time (if specified in the binary). But if you want to set a specific calling time you have to load it manually and set the time you want. If a dependency is hard (which is specified in the string), a failure in the required plugin will disable the dependent plugin.
|
|
|
|
|
|
Plugins also have **priorities**. If a plugin A is a dependency of plugin B, the plugin A will be called before. If a Plugin B was written before plugin A in the `--plugins` parameter, A will be called before, because these cases are contemplated in the dependency system algorithmics.
|
|
|
|
|
|
## EDCMON plugins
|
|
|
|
|
|
Even though in the plugins folder there are other plugins available (listed at the end of this page), these are the plugins specifics for Data center monitoring.
|
|
|
|
|
|
| Plugin | Information |
|
|
|
|-------------------|-------------|
|
|
|
| nodesensors | Reads confluent power sensors |
|
|
|
| nodesensors_report| Reports power readings explosed by nodesesors |
|
|
|
| nodesensors_alter | Checks limits and executes actions based on nodesensors|
|
|
|
|
|
|
|
|
|
## Creating new plugins
|
|
|
|
|
|
As previously said, the plugin periodic functions have to have concrete name. These functions names and arguments are the following:
|
|
|
|
|
|
```
|
|
|
void up_get_tag (cchar **tag, cchar **tags_deps)
|
|
|
char *up_action_init (cchar *tag, void **data_alloc, void *data)
|
|
|
char *up_action_periodic (cchar *tag, void *data)
|
|
|
char *up_post_data (cchar *msg, void *data)
|
|
|
```
|
|
|
|
|
|
The function `up_get_tag` is in charge of returning the plugin own tag and its dependency tags. A tag matches the name of the shared object file (without the extension). As seid, the dependency tags allows the Plugin Manager to search and open the tagged plugins automatically. The format is a tag list sepparated by plus signs. Example:
|
|
|
|
|
|
```
|
|
|
void up_get_tag(cchar **tag, cchar **tags_deps)
|
|
|
{
|
|
|
*tag = "some_test";
|
|
|
*tags_deps = "dependency1+!dependency2";
|
|
|
}
|
|
|
```
|
|
|
**If a dependency tag starts with some symbols** such as exclamation mark '!', it means that dependency is mandatory for the loading plugin, and in case it is not resolved the loading plugin will be disabled. The symbol '<' tells the Plugin Manager to inherit the timing of the dependant plugin.
|
|
|
|
|
|
The function `up_action_periodic()` or PA is the core function to perform actions and share data. It receives a tag and a pointer to the data associated with that tag. The received tag could be the self tag or the tag of other plugins. The plugin PA function will be called with its own tag and data when the specified time in `--plugins` argument has passed, or with other plugin tag and data after that plugin has called its own PA function with its own tag.
|
|
|
|
|
|
Examples of PA function types:
|
|
|
```
|
|
|
char *up_action_periodic(cchar *tag, void *data)
|
|
|
{
|
|
|
if (is_tag("tag2")) {
|
|
|
type2_t *d = (type2_t *) data;
|
|
|
// work
|
|
|
}
|
|
|
return NULL;
|
|
|
}
|
|
|
|
|
|
char *up_action_periodic_tag1(cchar *tag, void *data)
|
|
|
{
|
|
|
type1_t *d = (type1_t *) data;
|
|
|
// work
|
|
|
return NULL;
|
|
|
}
|
|
|
```
|
|
|
|
|
|
As you can see, you can define a generic `up_action_periodic()` function or one with a suffixed tag. A suffixed function will be called only when a plugin whose tag matches the function tag suffix. If you define just a generic version of the function, take into the account that you have to distinguish between tags. The macro `is_tag` will help you to do this and maintain your code clean and verbose.
|
|
|
|
|
|
The returning char is a message that Plugin Manager will print in case is not NULL. You can add some modifiers at the beginning of the message:
|
|
|
|
|
|
* `[D]` disables the plugin. It also re-activates the dependency system and could disable dependant plugins.
|
|
|
* `[=]` pauses the periodic call.
|
|
|
* `[X]` closes the Plugin Manager main thread.
|
|
|
|
|
|
The `up_action_init()` function works the same, it can receive the own plugin tag o other plugin tag. It is called one time before calling any PA function and can be used to allocate and initialize data.
|
|
|
|
|
|
```
|
|
|
static mydata_t mydata;
|
|
|
|
|
|
char *up_action_periodic_mytag (cchar *tag, void **data_alloc, void *data)
|
|
|
{
|
|
|
*data_alloc = &mydata;
|
|
|
return "I have been initialized and mydata will be shared among the loaded plugins";
|
|
|
}
|
|
|
|
|
|
char *up_action_periodic_tag2 (cchar *tag, void **data_alloc, void *data)
|
|
|
{
|
|
|
tag2_type_t var = (tag2_type_t) data;
|
|
|
return "Now I know that tag2 plugin has been initialized";
|
|
|
}
|
|
|
```
|
|
|
|
|
|
When an initialization function is called and receives its own plugin tag, the `data_alloc` double pointer serves as pointer to the data that self plugin wants to share with other plugins, so it is responsible to allocate the data and set the address pointer. When an initialization function is called and receives other plugin tag, the `data_alloc` variable is NULL and `data` parameter points to the shared data newly initialized by their own plugin, which is the same data referenced in the PA function.
|
|
|
|
|
|
The **configuration** string mentioned in EDCMON executable is also received when the initialization function is called with own plugin tag using the `data` parameter, and can be retrieved as a list of arguments:
|
|
|
|
|
|
```
|
|
|
static mydata_t mydata;
|
|
|
|
|
|
char *up_action_periodic_mytag (cchar *tag, void **data_alloc, void *data)
|
|
|
{
|
|
|
char **args = (char **) data;
|
|
|
*data_alloc = &mydata;
|
|
|
if (args != NULL) {
|
|
|
if (strcmp(args[0], "i_want_to_say_hello") == 0) {
|
|
|
printf("Hello!\n");
|
|
|
}
|
|
|
}
|
|
|
return "I have been initialized and mydata will be shared among the loaded plugins";
|
|
|
}
|
|
|
```
|
|
|
|
|
|
The final pipeline is:
|
|
|
```
|
|
|
1) up_get_tag (all plugins)
|
|
|
2) up_action_init (all plugins)
|
|
|
3) up_action_periodic (all_plugins)
|
|
|
4) up_action_periodic (the plugins whose time has passed, and then the plugins which depends on their data)
|
|
|
|
|
|
PA example, if B and C depends on A in --plugins=A.so:4000+B.so:3000+C.so:4000
|
|
|
1) A up_action_periodic will be called with 'A' tag.
|
|
|
2) B up_action_periodic will be called with 'A' tag.
|
|
|
3) C up_action_periodic will be called with 'A' tag.
|
|
|
4) B up_action_periodic will be called with 'B' tag.
|
|
|
5) C up_action_periodic will be called with 'C' tag.
|
|
|
6) After 3 seconds, B up_action_periodic will be called with 'B' tag.
|
|
|
7) After 4 seconds, A up_action_periodic will be called with 'A' tag.
|
|
|
8) After 4 seconds, B up_action_periodic will be called with 'A' tag.
|
|
|
9) After 4 seconds, C up_action_periodic will be called with 'A' tag.
|
|
|
10) After 6 seconds, B up_action_periodic will be called with 'B' tag.
|
|
|
11) After 8 seconds, A up_action_periodic will be called with 'A' tag.
|
|
|
12) And so on...
|
|
|
```
|
|
|
|
|
|
|
|
|
Finally, `up_post_data()` allows to receive external data to the plugins. In example, if Plugin Manager is being used by the EARD, when a job starts you could send a message to the framework containing the job and step IDs. By now you can distinguish the messages by `is_msg` macro. Maybe in the near future we implement the suffix system too.
|
|
|
|
|
|
### Helper macros
|
|
|
|
|
|
The following helper macros to define the functions and maintain your plugins updated in case of a change in some function. They can be found in plugin_manager.h:
|
|
|
|
|
|
```
|
|
|
#define declr_up_get_tag() void up_get_tag (cchar **tag, cchar **tags_deps)
|
|
|
#define declr_up_action_init(suffix) char * up_action_init##suffix (cchar *tag, void **data_alloc, void *data)
|
|
|
#define declr_up_action_periodic(suffix) char * up_action_periodic##suffix (cchar *tag, void *data)
|
|
|
#define declr_up_post_data() char * up_post_data (cchar *msg, void *data)
|
|
|
```
|
|
|
|
|
|
An example of the `up_action_periodic()`:
|
|
|
Examples of action_periodic function types:
|
|
|
```
|
|
|
declr_up_action_periodic(_tag1)
|
|
|
{
|
|
|
type1_t *d = (type1_t *) data;
|
|
|
// work
|
|
|
return NULL;
|
|
|
}
|
|
|
|
|
|
declr_up_action_periodic()
|
|
|
{
|
|
|
if (is_tag("tag2")) {
|
|
|
type2_t *d = (type2_t *) data;
|
|
|
// work
|
|
|
}
|
|
|
return NULL;
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Plugin Manager functions
|
|
|
|
|
|
By now these are the functions of the framework:
|
|
|
|
|
|
```
|
|
|
// Init as main binary function
|
|
|
int plugin_manager_main(int argc, char *argv[]);
|
|
|
|
|
|
// Init as a component of a binary
|
|
|
int plugin_manager_init(char *files, char *paths);
|
|
|
|
|
|
// Closes Plugin Manager main thread.
|
|
|
void plugin_manager_close();
|
|
|
|
|
|
// Wait until Plugin Manager exits.
|
|
|
void plugin_manager_wait();
|
|
|
|
|
|
// Asking for an action. Intended to be called from plugins.
|
|
|
void *plugin_manager_action(cchar *tag);
|
|
|
|
|
|
// Passing data to plugins. Intended to be called outside PM.
|
|
|
void plugin_mananger_post(cchar *msg, void *data);
|
|
|
```
|
|
|
|
|
|
The `plugin_manager_main()` receives the program arguments (argc and argv), in which is included `--plugins`. You can also call `plugin_manager_init()` if you prefer the list of plugins and search paths separately (but in the same format). `plugin_manager_action()` calls the PA function of the plugin whose tag is referenced, it can be useful in some contexts when a plugin prefers to call a required plugin manually. Finally `plugin_manager_wait()` waits until the Plugin Manager main thread is closed.
|
|
|
|
|
|
### Other plugins already available
|
|
|
|
|
|
| Plugin | Information |
|
|
|
|-------------------|-------------|
|
|
|
| conf | Reads ear.conf and shares its data with other plugins. |
|
|
|
| dummy | Just an example. |
|
|
|
| eardcon | Connects with EARD. Saves other plugins to do that. |
|
|
|
| kernel_cl | An OpenCL kernel test. |
|
|
|
| kernel_cuda | A CUDA kernel test. |
|
|
|
| keyboard | A keyboard input. |
|
|
|
| management | Initializes all management APIs. |
|
|
|
| management_viewer | Views all management information. |
|
|
|
| metrics | Initializes and read all metrics APIs |
|
|
|
| metrics_viewer | Views all metrics readings. |
|
|
|
| periodic_metrics | Receives metrics and computes a periodic_metric. |
|
|
|
| test_cpufreq | A CPUFreq test. |
|
|
|
| test_gpu | Initializes and read the GPU API. |
|
|
|
|
|
|
## FAQ
|
|
|
|
|
|
- **Can I load the same plugin twice?** No.
|
|
|
- **Is the tag mandatory value?** Yes, all the plugins require a tag.
|
|
|
- **And the dependency tags?** Can be set to NULL if the plugin does not have any dependency.
|
|
|
- **Do I have to specify the time of a plugin in the dependency list?** No, is not recommended. A plugin which is loaded by the dependency list instead using the `--plugins` parameter inherits the dependent plugin time if using the special character '<' at the beginning of the string.
|
|
|
- **If none of the dependencies are resolved, the plugin periodic function will be called anyways?** Depends if some of the dependencies are mandatory, specified by the exclamation mark (!).
|
|
|
- **What happens if a plugin has periodic time specified but haven't defined a periodic function?** If there is no periodic function, nothing will be called.
|
|
|
- **Do I have to define all the API functions in the plugin?** No, only those necessary for the correct plugin functionality. The `get_tag` function is the exception because the tag is a mandatory value.
|
|
|
- **Can `action_init` function be defined but `action_periodic` not?** Yes. Sometimes you want to perform an action just once and you can do it in the init function. In example, the job of the plugin `conf.so` is to read `ear.conf` and pass the configuration structure to the rest of loading plugins.
|
|
|
- **For a plugin which does not allocate data, is its periodic function called?** Yes, if it's defined. But the NULL value in the allocated data pointer disallows any information exchange, so periodic function of other plugins wont be called.
|
|
|
- **If a plugin has defined the a function `action_periodic_tagX` for the tag `tagX`, but also the general `action_periodic`, which of the two would be called?** If defined a suffixed function, that tagged version will be called. For the rest of the tags the general `action_periodic`. |
|
|
\ No newline at end of file |