|
. |
|
# Model performances
|
|
\ No newline at end of file |
|
|
|
|
|
## Sequential
|
|
|
|
|
|
|
|
Concerning the sequential version, we can see, as expected, that every iteration (model forward + model backward + weights update) takes the same amount of time. The sequential version has been run 4 times, and the results displayed are the mean of these 4 iterations.
|
|
|
|
|
|
|
|
![seq2](uploads/2aac9034c025ab01d24e32f162b6543c/seq2.png)
|
|
|
|
|
|
|
|
![seq](uploads/2562f07ab3d29743f8ef78b2d71d8515/seq.png)
|
|
|
|
|
|
|
|
The time taken for each iteration is approximately 35s. Thus, by running the model over 40 iterations, we get a total runtime of 24min06.
|
|
|
|
|
|
|
|
## OpenMP
|
|
|
|
|
|
|
|
About the OpenMP version, the model has been run 40 times, with 40 iterations, and using 112 cpus. The average runtime per iteration is 1430 ms, and the total runtime is 58 seconds.
|
|
|
|
|
|
|
|
Speedup = 24.1 / 0.98 = 24.7\
|
|
|
|
Efficiency = 0.22
|
|
|
|
|
|
|
|
![openmp2](uploads/aa9985dcfd64e916ad808b7845b42024/openmp2.png)
|
|
|
|
|
|
|
|
![openmp1](uploads/390fb74ac96bed684a40eef47f49672f/openmp1.png)
|
|
|
|
|
|
|
|
## OpenMP/n-OS-V
|
|
|
|
|
|
|
|
About the OpenMP/nOS-V version, the model has been run 40 times, with 40 iterations, and using 112 cpus. The average runtime per iteration is 1616 ms, and the total runtime is 66 seconds.
|
|
|
|
|
|
|
|
Speedup = 24.1 / 1.10 = 21.8\
|
|
|
|
Efficiency = 0.19
|
|
|
|
|
|
|
|
What is unexpected is that the OpenMP/nOS-V version is slower than OpenMP version. For now, I don't know if this is due to a mistake of configuration on OpenMP/nOS-V or not.
|
|
|
|
|
|
|
|
![openmpv1](uploads/07cb4e4d83ce40cb9b7b8804d64f0e2f/openmpv1.png)
|
|
|
|
|
|
|
|
![openmpv2](uploads/c01dc0a441e6854fdc4319ec2861be08/openmpv2.png)
|
|
|
|
|
|
|
|
![thread-state](uploads/ce8ff83559d65cc90c78d67c4cc32a29/thread-state.png)
|
|
|
|
|
|
|
|
![thread-state-legend](uploads/4626cbde639c30905d6b003db5f8d6d6/thread-state-legend.png)
|
|
|
|
|
|
|
|
The paraver trace shows us that nearly half of the threads are working simultaneously. We could first increase the efficiency by decreasing the number of threads used for the application. |
|
|
|
\ No newline at end of file |