... | @@ -46,10 +46,12 @@ The paraver trace shows us that nearly half of the threads are working simultane |
... | @@ -46,10 +46,12 @@ The paraver trace shows us that nearly half of the threads are working simultane |
|
|
|
|
|
![legend](uploads/1fbc249856438649a150b86717133a6a/legend.png)
|
|
![legend](uploads/1fbc249856438649a150b86717133a6a/legend.png)
|
|
|
|
|
|
A test has been run using openmp and extrae, with 2 threads and 2 cpus on one single NUMA (**numactl -N 1 -m 1**), to create a paraver trace of the execution. On the trace, we can see that most of the runtime is passed in matmul backward. Here below is a graph of the time taken per iteration for this same test:
|
|
A test has been run using openmp and extrae, with 2 threads and 2 cpus on one single NUMA (**numactl -N 1 -m 1**), to create a paraver trace of the execution. On the trace, we can see that most of the runtime is passed in matmul_backward (in green and yellow as there are 2 #pragma omp in this layer). Here below is a graph of the time taken per iteration for this same test:
|
|
|
|
|
|
![test-2-openmp-tinyshakespeare-mean](uploads/80003c0cba5f8759a46e445151d8fcd4/test-2-openmp-tinyshakespeare-mean.png)
|
|
![test-2-openmp-tinyshakespeare-mean](uploads/80003c0cba5f8759a46e445151d8fcd4/test-2-openmp-tinyshakespeare-mean.png)
|
|
|
|
|
|
As we see, the time taken for each iteration is 7 seconds on average. However, on another test, which runtime is shown just below, we see that when using one single CPU (but still using OpenMP), our average runtime is 41 seconds. This is unexpected, as in theory the runtime for 2 cpu should be twice the runtime for 1 cpu (at most).
|
|
As we see, the time taken for each iteration is 7 seconds on average. This same test with the same conditions has been runtime without setting our application on one single NUMA system, and the results showed that the average runtime per iteration was 140ms higher than when the NUMA system on which the application should run is narrowed to a single one. This difference is not very representative of any increase in performances.
|
|
|
|
|
|
|
|
However, something else stands out. On another test, which runtime is shown just below, we see that using one single CPU (but still using OpenMP), increase our average runtime up to 41 seconds. This is unexpected, as in theory the runtime for 2 cpu should be twice the runtime for 1 cpu (at most).
|
|
|
|
|
|
![test-5-openmp-tinyshakespeare-mean](uploads/d600df70538473235c9037630724a5a2/test-5-openmp-tinyshakespeare-mean.png) |
|
![test-5-openmp-tinyshakespeare-mean](uploads/d600df70538473235c9037630724a5a2/test-5-openmp-tinyshakespeare-mean.png) |
|
|
|
\ No newline at end of file |