|
|
![GPT2_Distributed_Diagram](uploads/2bb14c6b0a33fea35de81ffcdb5859ba/GPT2_Distributed_Diagram.png) |
|
|
\ No newline at end of file |
|
|
![GPT2_Distributed_Diagram](uploads/2bb14c6b0a33fea35de81ffcdb5859ba/GPT2_Distributed_Diagram.png)
|
|
|
|
|
|
# First benchmarking and comparisons
|
|
|
|
|
|
We will focus here on the transfer of the gradients from the 'worker' ranks to the 'server' rank (= rank 0).
|
|
|
Note that :
|
|
|
- Transfer of gradients from worker to server is done through MPI_Reduce (SUM)
|
|
|
- Transfer of updated parameters from server to workers is done through MPI_Bcast
|
|
|
- The data sent for each of these operations are about 250 000 floats so about 1 Gb.
|
|
|
- Accroding to [MN5 overview](https://www.bsc.es/supportkc/docs/MareNostrum5/overview), the transfer speed of a node is about 1Gb/s
|
|
|
|
|
|
|
|
|
Using 8 workers and 1 server. Data transfer is done through tasks using high priority.
|
|
|
Dt = 2.5 seconds
|
|
|
![base-9-priority](uploads/7193fd34722329b1c593279365ff7ef1/base-9-priority.png)
|
|
|
|
|
|
Using 8 workers and 1 server. Data transfer is done through tasks using high priority.
|
|
|
Dt = 2.5 s
|
|
|
|
|
|
![all-nine](uploads/2d471a1e2cf8dc0bf30f957a4554d9e5/all-nine.png) |