... | @@ -68,5 +68,13 @@ This way, we have transformed the outer loops and all collapsed inner loops into |
... | @@ -68,5 +68,13 @@ This way, we have transformed the outer loops and all collapsed inner loops into |
|
|
|
|
|
NB : Currently, you can set the collapsed depth and grainsize in _ompss-2_wrappers.h_, however the grainsize is not as powerful as an ompss grainsize, as it applies only to the innermost flatten loop.
|
|
NB : Currently, you can set the collapsed depth and grainsize in _ompss-2_wrappers.h_, however the grainsize is not as powerful as an ompss grainsize, as it applies only to the innermost flatten loop.
|
|
|
|
|
|
|
|
# Implementation
|
|
|
|
|
|
|
|
To implemented these task definition, we have created an abstract structure of dependency handler. This structure needs to be initialized with B,T, and the array of dependency sizes. Then, you have to define how many tokens per dependency your application should work with. Mind that it is possible to change this value multiple times during the model's computation. Therefore, you can define a different 'outer grainsize' for each layer (it has not been used in the current implementation but it could be leveraged).
|
|
|
|
|
|
|
|
Then, for each call to a layer, you have to create a wrapper, that you will define in ompss-2_settings, and implement using a simple macro at the end of the ompss-2_wrappers.c file. This makes it very easy to define new layers and to change the current grainsizes, length and collapse depth of the layers
|
|
|
|
|
|
# Performances
|
|
# Performances
|
|
|
|
|
|
|
|
The taskiter versions performs way better than the nested-tasks one. In fact, we get approximately a 1.15 speedup, although the runtimes can be very different from one execution to another, and seems sometimes to depend on how the model starts.
|
|
|
|
Moreover, although taskiter is more efficient, if `B*T` is too high, it will seconds or minutes to compute the cyclic graph of tasks. For example, with B=8 and T=1024, taskiter takes 54 seconds to compute the graph. A way to reduce this runtime would be to increase the taskloops grainsize defined in ompss-2_settings.h. |
|
|
|
\ No newline at end of file |