... | ... | @@ -29,13 +29,14 @@ Data transfer is done all at once through one single blocking MPI instruction. |
|
|
Dt = 1.0 s
|
|
|
|
|
|
![all-at-once-5](uploads/afd3253f7a71005bf376d39c6e3d68c6/all-at-once-5.png)
|
|
|
![base-5.code_legend](uploads/56f3e0f7bc8eb219a295bd16106f9008/base-5.code_legend.png)
|
|
|
|
|
|
- Broadcast takes about 95 ms
|
|
|
- Reduce takes about 380 ms
|
|
|
- - Forward + backward pass takes about 143 ms
|
|
|
- Time per iteration is 650 ms
|
|
|
|
|
|
![base-5.code_legend](uploads/56f3e0f7bc8eb219a295bd16106f9008/base-5.code_legend.png)
|
|
|
|
|
|
## Using 8 workers and 1 server
|
|
|
|
|
|
**One worker is computing half a token sequence (32 tokens / worker)**
|
... | ... | @@ -53,8 +54,10 @@ Data transfer is done all at once through one single blocking MPI instruction. |
|
|
Dt = 2.5 s
|
|
|
|
|
|
![all-nine](uploads/2d471a1e2cf8dc0bf30f957a4554d9e5/all-nine.png)
|
|
|
![base-9-priority.code_legend](uploads/fd7dfb7130bb685aa20d6636fe20d339/base-9-priority.code_legend.png)
|
|
|
- Broadcast takes about 500 ms
|
|
|
- Reduce takes about 115 ms
|
|
|
|
|
|
- Reduce takes about 500 ms
|
|
|
- Broadcast takes about 115 ms
|
|
|
- Forward + backward pass takes about 100 ms
|
|
|
- Time per iteration is about 720 ms |
|
|
\ No newline at end of file |
|
|
- Time per iteration is about 720 ms
|
|
|
|
|
|
![base-9-priority.code_legend](uploads/fd7dfb7130bb685aa20d6636fe20d339/base-9-priority.code_legend.png) |
|
|
\ No newline at end of file |