... | ... | @@ -29,13 +29,14 @@ Data transfer is done all at once through one single blocking MPI instruction. |
|
|
Dt = 1.0 s
|
|
|
|
|
|

|
|
|

|
|
|
|
|
|
- Broadcast takes about 95 ms
|
|
|
- Reduce takes about 380 ms
|
|
|
- - Forward + backward pass takes about 143 ms
|
|
|
- Time per iteration is 650 ms
|
|
|
|
|
|

|
|
|
|
|
|
## Using 8 workers and 1 server
|
|
|
|
|
|
**One worker is computing half a token sequence (32 tokens / worker)**
|
... | ... | @@ -53,8 +54,10 @@ Data transfer is done all at once through one single blocking MPI instruction. |
|
|
Dt = 2.5 s
|
|
|
|
|
|

|
|
|

|
|
|
- Broadcast takes about 500 ms
|
|
|
- Reduce takes about 115 ms
|
|
|
|
|
|
- Reduce takes about 500 ms
|
|
|
- Broadcast takes about 115 ms
|
|
|
- Forward + backward pass takes about 100 ms
|
|
|
- Time per iteration is about 720 ms |
|
|
\ No newline at end of file |
|
|
- Time per iteration is about 720 ms
|
|
|
|
|
|
 |
|
|
\ No newline at end of file |