Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Distributed Model

Distributed Model · Changes

Page history
Update Distributed Model authored Aug 08, 2024 by tchatela's avatar tchatela
Hide whitespace changes
Inline Side-by-side
Distributed-Model.md
View page @ 18e6bd29
...@@ -20,6 +20,9 @@ Dt = 1.0 seconds ...@@ -20,6 +20,9 @@ Dt = 1.0 seconds
![base-5](uploads/89bf03de5326dbbd56ab78c0548284bc/base-5.png) ![base-5](uploads/89bf03de5326dbbd56ab78c0548284bc/base-5.png)
- Broadcast takes about 60 to 230 ms
- Reduce takes from 180 ms to 380 ms
- Time per iteration is 650 ms
Data transfer is done all at once through one single blocking MPI instruction. Data transfer is done all at once through one single blocking MPI instruction.
Dt = 1.0 s Dt = 1.0 s
...@@ -27,6 +30,10 @@ Dt = 1.0 s ...@@ -27,6 +30,10 @@ Dt = 1.0 s
![all-at-once-5](uploads/afd3253f7a71005bf376d39c6e3d68c6/all-at-once-5.png) ![all-at-once-5](uploads/afd3253f7a71005bf376d39c6e3d68c6/all-at-once-5.png)
![base-5.code_legend](uploads/56f3e0f7bc8eb219a295bd16106f9008/base-5.code_legend.png) ![base-5.code_legend](uploads/56f3e0f7bc8eb219a295bd16106f9008/base-5.code_legend.png)
- Broadcast takes about 95 ms
- Reduce takes about 380 ms
- Time per iteration is 650 ms
## Using 8 workers and 1 server ## Using 8 workers and 1 server
**One worker is computing half a token sequence (32 tokens / worker)** **One worker is computing half a token sequence (32 tokens / worker)**
......
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Improvements
  • Traces
  • Fork Join Model
  • Task Based Model
  • Distributed Model