Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Task Based Model

Task Based Model · Changes

Page history
Update Task Based Model authored Sep 20, 2024 by tchatela's avatar tchatela
Show whitespace changes
Inline Side-by-side
Task-Based-Model.md
View page @ 038bd4d4
...@@ -30,7 +30,7 @@ Moreover, the aim of the backward layer is to compute the gradients, which will ...@@ -30,7 +30,7 @@ Moreover, the aim of the backward layer is to compute the gradients, which will
The particularity of this given order is that we don't have to fully complete a step before going to the next one. In fact, the gradients values can be set to 0 just before computing their new values during their related backward layer. This way, we can, for example, overlap a backward layer will setting the gradients of the next backward layer to 0. Also, this same idea works for the update of the forward layers' parameters, as we can update the weights of one layer during the its previous forward layer. However, keep in mind that the backward pass is the forward pass but mirrored, so the first backward layers are implying data dependencies (the update of the weights) towards the last forward layer. The particularity of this given order is that we don't have to fully complete a step before going to the next one. In fact, the gradients values can be set to 0 just before computing their new values during their related backward layer. This way, we can, for example, overlap a backward layer will setting the gradients of the next backward layer to 0. Also, this same idea works for the update of the forward layers' parameters, as we can update the weights of one layer during the its previous forward layer. However, keep in mind that the backward pass is the forward pass but mirrored, so the first backward layers are implying data dependencies (the update of the weights) towards the last forward layer.
With everything that has been stated, we can now create the following data flow diagram : ![GPT-2_task_based_model](uploads/10d03f7bd64d70881163ab0c15f7749b/GPT-2_task_based_model.png) With everything that has been stated, we can now create the following data flow diagram : !![GPT-2_task_based_model](uploads/0b71cb63f70b983d90c46001bcc29702/GPT-2_task_based_model.png)
![GPT-2_task_based_model-legend](uploads/795fa5f34780b4860c5a2b3a050be977/GPT-2_task_based_model-legend.png) ![GPT-2_task_based_model-legend](uploads/795fa5f34780b4860c5a2b3a050be977/GPT-2_task_based_model-legend.png)
......
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Fork Join Model
  • Task Based Model
  • Distributed Model
  • Various Informations