Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Various informations

Various informations · Changes

Page history
Update Various informations authored Sep 20, 2024 by tchatela's avatar tchatela
Show whitespace changes
Inline Side-by-side
Various-informations.md
View page @ 156094eb
...@@ -53,3 +53,4 @@ In the end, I managed to get this slicing to give correct results, but I had sti ...@@ -53,3 +53,4 @@ In the end, I managed to get this slicing to give correct results, but I had sti
- For the dataloader, my strategy is to load the tokens on one rank and to scatter them. In the GPU version, each token is loading its tokens. I have chosen this scattering method as it is more simple for now to change the scattering shapes. - For the dataloader, my strategy is to load the tokens on one rank and to scatter them. In the GPU version, each token is loading its tokens. I have chosen this scattering method as it is more simple for now to change the scattering shapes.
- When B*T is large, you must use tinystories as dataset - When B*T is large, you must use tinystories as dataset
- I do not have implemented mini-batches, and this may be one of the first thing to do now. However it should be very simple to do. - I do not have implemented mini-batches, and this may be one of the first thing to do now. However it should be very simple to do.
- I have benchmarked the application here https://docs.google.com/spreadsheets/d/1uS5uPAVtFLoj4BvirT4mke5_pVDsOm85ciNP-0iIO5M/edit?usp=sharing and I will continue benchmarking it in the next weeks for my report
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Fork Join Model
  • Task Based Model
  • Distributed Model
  • Metrics
  • Various Informations