... | @@ -34,9 +34,21 @@ With everything that has been stated, we can now create the following data flow |
... | @@ -34,9 +34,21 @@ With everything that has been stated, we can now create the following data flow |
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
# First implementation
|
|
|
|
|
|
|
|
To begin with, we will put a taskwait between each call of the main training loop. However the rest of the program will be task-based, as shown by the following sketch:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
This lead us to the following performances :
|
|
|
|
|
|
|
|
Here we tried to slice according to the token sequence only (BATCH_SUBSIZE=4), or according to the sequence token and the sequence batch (BATCH_SUBSIZE=1). The slope obtained for the second version seems better so we will keep it for the following tests.
|
|

|
|

|
|
|
|
|
|
|
|
Because we have a taskwait between each call of the main training loop, it is possible to get the runtime, speedup and efficiency for the forward and backward pass independently.
|
|

|
|

|
|

|
|

|
|

|
|

|
|
|
|
|
|
|
|
We can compare now the efficiency between this version of the task based model and the fork join model
|
|
 |
|
 |
|
|
|
\ No newline at end of file |