... | ... | @@ -104,8 +104,10 @@ Comparison of the update phase, without considering task overlapping. We will me |
|
|
| Version | Number of ranks | Update phase time |
|
|
|
| ---------- | --------------- | ----------------- |
|
|
|
| Strategy 1 | 5 |510 ms |
|
|
|
| Strategy 2 | 4 |210 - 230 ms |
|
|
|
| Strategy 3 | 4 | |
|
|
|
| Strategy 2 | 4 | 70-100 ms |
|
|
|
| Strategy 3 | Not tested | Not tested |
|
|
|
|
|
|
Note that the strategy 3 is the one used in the GPU version. However, it is much simpler for us to use the strategy 2 with the task-based implementation.
|
|
|
|
|
|
# Multi-reduce + broadcast communications
|
|
|
|
... | ... | |