... | ... | @@ -81,9 +81,12 @@ In order to calculate our gradients for our backpropagation, we have to keep an |
|
|
- **encoded** : Output of the positional_encoding layer (B, T, C)
|
|
|
- **ln1** : Output of the first layernorm inside the transformer block (L, B, T, C)
|
|
|
- **ln1_mean** :
|
|
|
- **ln1_mean** :
|
|
|
- **ln1_mean** :
|
|
|
- **ln1_mean** :
|
|
|
- **ln1_rstd** :
|
|
|
- **qkv** :
|
|
|
- **atty** :
|
|
|
- **preatt** :
|
|
|
- **att** :
|
|
|
- **** :
|
|
|
- **ln1_mean** :
|
|
|
|
|
|
|
... | ... | |