... | ... | @@ -2,6 +2,8 @@ This wiki's goal is to link the GPT-2 model with its implementation in C. It wil |
|
|
|
|
|
# Description of the model
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
The Generative Pre-trained Transformer 2 (GPT-2) is a Large Language Model (LLM) introduced by OpenAI. It's particularity is to be composed of many Transformer layer, as you can see below.
|
|
|
|
|
|
![Basic structure of GPT-2](https://www.researchgate.net/publication/373352176/figure/fig1/AS:11431281202501967@1698856108167/GPT-2-model-architecture-The-GPT-2-model-contains-N-Transformer-decoder-blocks-as-shown.ppm)
|
... | ... | @@ -11,4 +13,39 @@ However, the model implemented in [our reference code](https://github.com/karpat |
|
|
+ The second residual forward is linking the output of the multi-head masked attention, and not the output of the normalization layer
|
|
|
+ The attention layer's function does not include the two linear layers as the sketch suggests. These two layers are calculated using matmul functions
|
|
|
|
|
|
Therefore, here is a rectified sketch of the model implented : |
|
|
\ No newline at end of file |
|
|
Therefore, here is a rectified sketch of the model implented :
|
|
|
![Adapted structure of GPT-2](uploads/9055498e3cf0d4059418eab6fa8b1133/gpt2-cleaned.png)
|
|
|
|
|
|
|
|
|
|
|
|
## Tokens
|
|
|
|
|
|
Describe token -> words and words -> tokens
|
|
|
|
|
|
## Data formatting
|
|
|
|
|
|
Introducing the main variables (V, NH, L ...)
|
|
|
Describing the matrices
|
|
|
Short
|
|
|
|
|
|
## Advanced description
|
|
|
|
|
|
Explain the functions in detail. Inputs, outputs and variables
|
|
|
|
|
|
## Variable dictionary
|
|
|
|
|
|
Dictionary of the model's parameters
|
|
|
|
|
|
# Model performances
|
|
|
|
|
|
## Sequential
|
|
|
|
|
|
Performance of the sequential model
|
|
|
|
|
|
## OpenMP
|
|
|
|
|
|
Performance of the model with OpenMP
|
|
|
|
|
|
## OpenMP/n-OS-V
|
|
|
|
|
|
Performance of the model with OpenMP/n-OS-V |
|
|
\ No newline at end of file |