Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Task Based Model

Task Based Model · Changes

Page history
Update Task Based Model authored Jul 19, 2024 by tchatela's avatar tchatela
Hide whitespace changes
Inline Side-by-side
Task-Based-Model.md
View page @ 1cc0d749
......@@ -45,10 +45,17 @@ This lead us to the following performances :
Here we tried to slice according to the token sequence only (BATCH_SUBSIZE=4), or according to the sequence token and the sequence batch (BATCH_SUBSIZE=1). The slope obtained for the second version seems better so we will keep it for the following tests.
![Comparison_BATCH_SUBSIZE](uploads/2680eec7bdf9875f9a7b79f21ea3f565/Comparison_BATCH_SUBSIZE.png)
Because we have a taskwait between each call of the main training loop, it is possible to get the runtime, speedup and efficiency for the forward and backward pass independently.
Because we have a taskwait between each call of the main training loop, it is possible to get the runtime, speedup and efficiency for the forward and backward pass independently. The following diagrams give us the insight that the application is memory bound.
![runtimeT](uploads/7e7c70a6bd6723d02927bebd614eb242/runtimeT.png)
![tbS](uploads/6f3833c1c74dbfa74eb3c6f78dd9cf2d/tbS.png)
![tbE](uploads/70660e5b4387ac75b7f78e79b8563ed2/tbE.png)
We can compare now the efficiency between this version of the task based model and the fork join model
![cmptf](uploads/7a0dd4169473042b29639600f4c28095/cmptf.png)
\ No newline at end of file
![cmptf](uploads/7a0dd4169473042b29639600f4c28095/cmptf.png)
# Final implementation
Now, we want to remove the taskloops put across the main training loop to improve tasks management and to allow the use of taskiter.
![Task_based_implementation_final](uploads/3551b24d36cbae27f861414cd2063322/Task_based_implementation_final.png)
\ No newline at end of file
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Improvements
  • Traces
  • Fork Join Model
  • Task Based Model