Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • EAR EAR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • EAR_teamEAR_team
  • EAREAR
  • Wiki
  • Classification strategies

Classification strategies · Changes

Page history
v5.2 release authored Oct 23, 2025 by Oriol's avatar Oriol
Hide whitespace changes
Inline Side-by-side
Classification-strategies.md 0 → 100644
View page @ 168bb2b7
As laid out in the [Classification](Architecture#classification) section of the Library, EAR incorporates three different strategies. This page will be dedicated to explaining how do they work internally, how and when do they become available to the end user.
## Default model
EAR's default classification model is based on the application of a fixed set of thresholds (in absolute units) to `CPI` and `MEM_GBS` metrics. These thresholds, available since EAR's installation, would allow identifying the different execution phases on a fundamental level.
### Input
This strategy takes 4 thresholds, 2 for characterizing CPU-bound applications, and 2 for MEMORY-bound ones. In turn, these are defined according to the architecture's specifications via domain knowledge of administrators. For instance, the values proposed for [*Sapphire Rapids*](https://www.intel.com/content/www/us/en/products/sku/231746/intel-xeon-platinum-8480-processor-105m-cache-2-00-ghz/specifications.html) type of node are
- CPU-bound `CPI`: 0.4
- CPU-bound `MEM_GBS`: 180
- MEMORY-bound `CPI`: 0.4
- MEMORY-bound `MEM_GBS`: 250
### Classification philosophy
With these thresholds defined, the classification proceeds as follows:
```
Let S be the last registered signature
Let <CPU_CPI, CPU_BWD> be the tuple of CPU-bound thresholds
Let <MEM_CPI, MEM_BWD> be the tuple of MEMORY-bound thresholds
IF (S->CPI <= CPU_CPI && S->MEM_GBS <= CPU_BWD)
Mark app as CPU-bound
ELSE IF (S->CPI >= MEM_CPI && S->MEM_GBS >= MEM_BWD)
Mark app as MEMORY-bound
ELSE
Mark app as MIX
```
Let us go over the idiosincracy of the current strategy:
1. To begin with, the model checks if the the `CPI` and `MEM_GBS` of the application are _below_ the CPU-bound thresholds. The sign of this comparison is given by the fact that we expect a CPU-bound application to be executing lots of instructions (thus having a small CPI) and not bounded by memory (thus registering a low memory bandwidth).
2. If the app is not CPU-bound, the strategy checks whether the considered metrics are _above_ the MEMORY-bound thresholds. The sign of this comparison is given by the fact that we expect a MEMORY-bound app to be using a considerable amount of memory bandwidth (thus having a big MEM_GBS) and not executing too many instructions (thus having a big CPI aswell).
3. If none of these conditions are met, the strategy labels the app as MIX.
The strength of this approach is its quickness and effectiveness, given that the classification is based upon the performance to be typically expected from the execution phases considered. However, this strategy is limited by its rigidness: the fact that (1) it works with a small amount of thresholds, and (2) in absolute values, leads to a mostly rigid (although not incorrect) classification of the applications' activity.
## Roofline model
The roofline model combines floating point performance and memory traffic to characterize the activity of an application based on the performance limitations of the hardware <A href="#roofline">[2]</A>. For EAR, it allows for identifying execution phase types in a simple and quick way in runtime via bottleneck analysis.
### Input
To use this strategy in EAR, we need the floating point performance and memory traffic peaks, which can be computed either theoretically or empirically. The EAR team offers its own tool, called [ear-roofline](https://gitlab.bsc.es/ear_team/ear-roofline), which allows for computing the theoretical peaks automatically, according to the architecture's specifications, and to do so, it makes use of the following equations:
![equations.png](/images/equations.png)
To give an idea of how to apply these equations, let us give an example over [EPYC 9654](https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9654.html) nodes:
![examples.png](/images/examples.png)
Once the peaks have been properly computed, these are stored in a file of the form `roofline.{tag}.data`, where `tag` corresponds to the tag of the targeted architecture as defined in the `ear.conf` file, which will depend on the environment the user is working on (check [Tags](Configuration#tags) for more information). The format of the stored peaks is
```shell
{peak memory bandwidth} {peak floating performance}
```
Following the EPYC 9654 example, we would create a file called `roofline.epyc9654.data` containing
```shell
921.6 22732.8
```
Moreover, this file is expected to be stored in the `$EAR_ETC/ear/coeffs/island{X}` directory; check the [EARL configuration](Configuration#earl-configuration) and the [Island description](Configuration#island-description) sections for more details on this. There exists a way to circumvent this requirement, however: as indicated in the [Paths modification](EAR-research-envvars#paths-modification) section, there's an environment variable called `HACK_ROOFLINE_COEFF_PATH` which allows hacking the path to the coefficients file. Note that **this variable only hacks the path, and not the expected filename**, whose format still has to be complied with.
### Classification strategy
Once EAR has access to the roofline peaks, the classification proceeds as follows
```
Let S be the last registered signature
Let <PEAK_FLOPS, PEAK_GBS> be the tuple of peaks of the architecture
IF (S->GFLOPS / S->MEM_GBS >= PEAK_FLOPS / PEAK_GBS)
Mark app as CPU-bound
ELSE IF (S->MEM_GBS >= PEAK_GBS * 0.75)
Mark app as MEMORY-bound
ELSE
Mark app as MIX
```
Let us go over it step by step:
1. The strategy begins by checking if the operational intensity (i.e., the ratio between flops and bytes) is bigger than the threshold defined by the peaks of the architecture. If so, the app is marked as _CPU-bound_.
2. If the app is not _CPU-bound_, the strategy checks if the app has a memory bandwidth close enough (in absolute units) to the peak memory bandwidth of the architecture via a threshold (which, by default, is experimentally set to 0.75). If the application meets this condition, it is labelled as _MEMORY-bound_.
3. If None of both conditions are met, then it is labelled as _MIX_.
This way, the original roofline model is adapted to include the _MIX_ execution phase while maintaining its classification philosophy, thus allowing to identify execution phases with a quickness and effectiveness similar to the initial classification strategy. Although this strategy is more robust (given that it is based on bottleneck analysis), it is virtually just as rigid as the previous one.
## K-medoids
K-medoids is a clustering method based in the k-means <A href="#k-means">[6]</A> algorithm, but instead of using _centroids_ as representatives of the classes, it uses _medoids_ (i.e., elements of the dataset) <A href="#k-medoids">[7]</A>. This strategy, which becomes available once EAR has enough user data in its database to train it, is flexible enough to be regenerated over time, thus accounting for changes in the type of applications executed by users as well as ensuring its robustness.
### Input
To achieve a balance between classification quality and runtime performance, this strategy works with a subset of signature metrics, conformed by `CPI`, `TPI`, `GFLOPS` and `MEM_GBS`. This way, it combines metrics both from the architecture and the application's computational activity.
To use this strategy, two files are needed: one for the standardization process, and another one for the medoids. Regarding the first one, it stores the mean and standard deviations in the following format:
```shell
{CPI std} {CPI mean} {TPI std} {TPI mean} {GFLOPS std} {GFLOPS mean} {MEM_GBS std} {MEM_GBS mean}
```
This first file must follow the format name `extremes.{tag}.data`, where `tag` corresponds to the tag of the node partition.
Now, regarding the second file, it stores the medoids in the following format:
```shell
{CPU-bound CPI} {CPU-bound TPI} {CPU-bound GFLOPS} {CPU-bound MEM_GBS} {MEMORY-bound CPI} {MEMORY-bound TPI} {MEMORY-bound GFLOPS} {MEMORY-bound MEM_GBS} {MIX CPI} {MIX TPI} {MIX GFLOPS} {MIX MEM_GBS}
```
This file must also follow a format name `medoids.{tag}.data`, where `tag` corresponds to the tag of the node partition.
**NOTE**: the `tag` will depend on the environment the user is working on. Check [Tags](Configuration#tags) for more information on this.
**NOTE 2**: as in the _roofline_ case, it is expected that both `medoids` and `extremes` files are stored in the `${EAR_ETC}/ear/coeffs/island{X}` directory. Check the [EARL configuration](Configuration#earl-configuration) and the [Island description](Configuration#island-description) sections for more details on this. Additionally, like in the previous case, there exists an environment variable called `HACK_MEDOIDS_COEFF_PATH` which, as defined in the [Paths modification](EAR-research-envvars#paths-modification) section, allows to hack the path to the coefficients.
Finally, just as with the roofline strategy, the EAR team offers its own tool, called ear-kmedoids (<ins>still in development</ins>), which allows computing and generating automatically both files with the format required by EAR, and in a user-friendly flavour.
### Classification strategy
Once EAR has access to the `extremes` and `medoids` files, the classification proceeds as follows:
```
Let S be the last registered signature
Let <CPU_MEDOID, MEM_MEDOID, MIX_MEDOID> be the tuple of medoids
Let V = [S->CPI, S->TPI, S->GFLOPS, S->MEM_GBS]
Standardize vector V
Let D1 = euclidean_distance(V, CPU_MEDOID)
Let D2 = euclidean_distance(V, MEM_MEDOID)
Let D3 = euclidean_distance(V, MIX_MEDOID)
IF (D1 < D2 and D1 < D3)
Mark app as CPU-bound
ELSE IF (D2 < D1 and D2 < D3)
Mark app as MEMORY-bound
ELSE
Mark app as MIX
```
Let us go over these steps:
1. The strategy begins by selecting the targeted subset of signature metrics and standardizing them. This step implicitly involves EAR preloading the extremes (i.e., mean and std of each metric) during the initialization of the Library, thus doing such loading just once per execution.
2. Once the metrics have been standardized, we compute the euclidean distance of this vector to the different medoids, which are also preloaded during the initialization of the Library (just like extremes). We name these distances D1, D2 and D3.
3. With these distances computed, we identify the current execution phase of the app by checking which medoid is closer to the signature.
With this strategy, EAR's classification becomes more flexible and adaptative while maintaining its robustness, which helps attain the maximum performance expected out of this Library's module.
---
As a final note, the Library selects the classification strategy by prioritizing the *K-medoids* over the *Roofline*, and this latter one over the *Default* strategy. It can be thought of in the following way:
```
IF K-medoids strategy is available:
Load K-medoids strategy
ELSE IF Roofline is available:
Load roofline strategy
ELSE
Load default strategy
```
Thus, the user must be aware of the files available and accessible to EAR, which will determine how EAR approaches signature classification.
## References
<a name="stream">[1]</a> McCalpin, John. “STREAM: Sustainable memory bandwidth in high performance computers.” _http://www.cs.virginia.edu/stream/_ (2006).
<a name="roofline">[2]</a> Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: an insightful visual performance model for multicore architectures. _Communications of the ACM_, _52_(4), 65-76.
<a name="peak-flops">[3]</a> Andreolli, C., Thierry, P., Borges, L., Skinner, G., Yount, C., Jeffers, J., & Reinders, J. (2015). Characterization and optimization methodology applied to stencil computations. _High Performance Parallelism Pearls_, 377-396.
<a name="peak-gbs">[4]</a> Bakos, J. D. (2016). Multicore and data-level optimization. In Embedded Systems (pp. 49–103). Elsevier. https://doi.org/10.1016/b978-0-12-800342-8.00002-x
<a name="hpl">[5]</a> Petitet, Antoine. “HPL-a portable implementation of the high-performance Linpack benchmark for distributed-memory computers.” _http://www.netlib.org/benchmark/hpl/_ (2004).
<a name="k-means">[6]</a> Blömer, J., Lammersen, C., Schmidt, M., & Sohler, C. (2016). Theoretical analysis of the k-means algorithm–a survey. _Algorithm Engineering: Selected Results and Surveys_, 81-116.
<a name="k-medoids">[7]</a> Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. _Expert systems with applications_, _36_(2), 3336-3341.
Clone repository
  • Home
  • User guide
    • Use cases
      • MPI applications
      • Non-MPI applications
      • Other use cases
      • Usage inside Singularity containers
      • Usage through the COMPSs Framework
    • EAR data
      • Post-mortem application data
      • Runtime report plug-ins
      • EARL events
      • MPI stats
      • Paraver traces
      • Grafana
    • Submission flags
    • Examples
    • Job accounting
    • Job energy optimization
  • Tutorials
  • Commands
    • Job accounting (eacct)
    • System energy report (ereport)
    • EAR control (econtrol)
    • Database management
    • erun
    • ear-info
  • Environment variables
    • Support for Intel(R) speed select technology
  • Admin Guide
    • Quick installation guide
    • Installation from RPM
    • Updating
  • Installation from source
  • Architecture/Services
  • High Availability support
  • Configuration
  • Classification strategies
  • Learning phase
  • Plug-ins
  • Powercap
  • Report plug-ins
  • Database
    • Updating the database from previous EAR versions
    • Tables description
  • Supported systems
  • EAR Data Center Monitoring
  • CHANGELOG
  • FAQs
  • Known issues