Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Distributed Model

Distributed Model · Changes

Page history
Update Distributed Model authored Aug 13, 2024 by tchatela's avatar tchatela
Hide whitespace changes
Inline Side-by-side
Distributed-Model.md
View page @ ee1a0fac
......@@ -60,4 +60,14 @@ Dt = 2.5 s
- Forward + backward pass takes about 100 ms
- Time per iteration is about 720 ms
![base-9-priority.code_legend](uploads/fd7dfb7130bb685aa20d6636fe20d339/base-9-priority.code_legend.png)
\ No newline at end of file
![base-9-priority.code_legend](uploads/fd7dfb7130bb685aa20d6636fe20d339/base-9-priority.code_legend.png)
# Bottleneck
At this point, the bottleneck of this distributed version is to have an efficient update phase. An update phase will typically be made of 2 or 3 steps:
- Sum all the distributed gradients
- Update the parameters using the summed gradients array
- If necessary, share the updated parameters to all ranks
As the transfer speed limit in and out of a node is 10 Gb/s, we need to reduce as much as possible the communications on one single node to avoid congestion. As an example, this is what happened in the example with 9 ranks. Let's investigate it:
With 9 ranks, we are using 5 nodes. At the end of the forward pass, we are using a Reduce collective operation to sum all the gradients on one node. This means that, with a basic reduction strategy, 4 nodes will send 1Gb to a single node. Therefore, node 0 will receive a total of 4 Gb, which creates congestion. With a in transfer speed of 10Gb/s per node, we can expect the transfer to take 400ms.
\ No newline at end of file
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Improvements
  • Traces
  • Fork Join Model
  • Task Based Model
  • Distributed Model