... | @@ -153,5 +153,5 @@ With B=16, T=1024, 1 rank/socket |
... | @@ -153,5 +153,5 @@ With B=16, T=1024, 1 rank/socket |
|
## Trace
|
|
## Trace
|
|
|
|
|
|
Here we are using B=8, T=1024, 4 ranks, 1 rank/socket
|
|
Here we are using B=8, T=1024, 4 ranks, 1 rank/socket
|
|
|
|
The kernel taking most of the runtime is Attention Key Value, as its runtime is proportionnal to T*T
|
|
 |
|
 |