Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • GPT2 Parallelization and porting

GPT2 Parallelization and porting · Changes

Page history
Update GPT2 Parallelization and porting authored Jun 28, 2024 by tchatela's avatar tchatela
Hide whitespace changes
Inline Side-by-side
GPT2-Parallelization-and-porting.md
View page @ c864415b
...@@ -276,58 +276,3 @@ The optimization function used for model update is [AdamW](https://pytorch.org/d ...@@ -276,58 +276,3 @@ The optimization function used for model update is [AdamW](https://pytorch.org/d
* beta2 : 0.999 * beta2 : 0.999
* epsilon : 1e-8 * epsilon : 1e-8
* weight decay : 0 * weight decay : 0
# Model performances
## Sequential
Concerning the sequential version, we can see, as expected, that every iteration (model forward + model backward + weights update) takes the same amount of time. The sequential version has been run 4 times, and the results displayed are the mean of these 4 iterations.
![seq2](uploads/2aac9034c025ab01d24e32f162b6543c/seq2.png)
![seq](uploads/2562f07ab3d29743f8ef78b2d71d8515/seq.png)
The time taken for each iteration is approximately 35s. Thus, by running the model over 40 iterations, we get a total runtime of 24min06.
## OpenMP
About the OpenMP version, the model has been run 40 times, with 40 iterations, and using 112 cpus. The average runtime per iteration is 1430 ms, and the total runtime is 58 seconds.
Speedup = 24.1 / 0.98 = 24.7\
Efficiency = 0.22
![openmp2](uploads/aa9985dcfd64e916ad808b7845b42024/openmp2.png)
![openmp1](uploads/390fb74ac96bed684a40eef47f49672f/openmp1.png)
## OpenMP/n-OS-V
About the OpenMP/nOS-V version, the model has been run 40 times, with 40 iterations, and using 112 cpus. The average runtime per iteration is 1616 ms, and the total runtime is 66 seconds.
Speedup = 24.1 / 1.10 = 21.8\
Efficiency = 0.19
What is unexpected is that the OpenMP/nOS-V version is slower than OpenMP version. For now, I don't know if this is due to a mistake of configuration on OpenMP/nOS-V or not.
![openmpv1](uploads/07cb4e4d83ce40cb9b7b8804d64f0e2f/openmpv1.png)
![openmpv2](uploads/c01dc0a441e6854fdc4319ec2861be08/openmpv2.png)
![thread-state](uploads/ce8ff83559d65cc90c78d67c4cc32a29/thread-state.png)
![thread-state-legend](uploads/4626cbde639c30905d6b003db5f8d6d6/thread-state-legend.png)
The paraver trace shows us that nearly half of the threads are working simultaneously. We could first increase the efficiency by decreasing the number of threads used for the application.
# Next steps
Here are the next steps I plan to work on :
* Analyze the paraver trace that I got for the OpenMP version
* Configure OVNI for the OpenMP/n-OS-V to get a trace
* Investigate on why the first iteration is faster for the sequential version and slower for the openMP versions
* Continue to upgrade the tools I have in order to get a testing pipeline that gives me directly all performances and graphs for a version of the program.
* Read related work on GPT2 parralelization
* Investigate on why the tokenizing of the tinystories dataset crashes
These steps have reported in the [issues section](https://gitlab.bsc.es/tchatela/llm.c-gpt2/-/issues).
\ No newline at end of file
Clone repository
  • Distributed Model
  • Fork Join Model
  • GPT2 Parallelization and porting
  • Metrics
  • Runtime and performances
  • Task Based Model
  • Various informations
  • _sidebar
  • Home