... | @@ -140,11 +140,23 @@ As you can see, we have a better communication time with 4 ranks. This is becaus |
... | @@ -140,11 +140,23 @@ As you can see, we have a better communication time with 4 ranks. This is becaus |
|
|
|
|
|
With B=16, T=1024, 1 rank/socket
|
|
With B=16, T=1024, 1 rank/socket
|
|
|
|
|
|
| Number of ranks | time/iteration |
|
|
| Number of ranks | time/iteration | tokens/s | tokens/(s.cpus) |
|
|
| ------ | ------ |
|
|
| ------ | ------ | ------ | ------ |
|
|
| 1 | 67000 ms |
|
|
| 1 | 67000 ms | | |
|
|
| 2 | 31500 ms |
|
|
| 2 | 31739 ms | 516 | 4.6 |
|
|
| 4 | 15000 ms |
|
|
| 4 | 15084 ms | 1086 | 4.9 |
|
|
|
|
| 8 | 7624 ms | 2149 | 4.8 |
|
|
|
|
| 16 | 4084 ms | 4011 | 4.48 |
|
|
|
|
|
|
|
|
|
|
|
|
# Increasing the batch size per rank
|
|
|
|
|
|
|
|
4 ranks, 1 rank/socket, T=1024
|
|
|
|
|
|
|
|
| Batch size (B) | time/iteration | tokens/s | tokens/(s.cpus) |
|
|
|
|
| ------ | ------ | ------ | ------ |
|
|
|
|
| 4 | | | |
|
|
|
|
| 8 | | | |
|
|
|
|
| 16 | | | |
|
|
|
|
| 32 | | | |
|
|
|
|
| 64 | | | | |