Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • L llm.c - GPT2
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tchatela
  • llm.c - GPT2
  • Wiki
  • Metrics

Metrics · Changes

Page history
Create Metrics authored Sep 20, 2024 by tchatela's avatar tchatela
Hide whitespace changes
Inline Side-by-side
Metrics.md 0 → 100644
View page @ ee121d8b
We use a total of 6 metrics to evaluate the model :
- Runtime per iteration (runtime to calculate forward + reset grads + backward + update)
- tokens/s : Number of computed tokens per second (=B*T/runtime_per_iteration)
- tokens/(s.cpus) : Number of computed tokens per second per CPU (=(token/s)/num_cpu))
- Loss : Training loss of the model
- MFU : Model flop utilization (application GFLOPS / MN5 GFLOPS)
As long as we are parallelizing the model, you should always get the same train losses as the model is deterministic.
The GPU uses the same metrics so we can compare the token/s metrics.
For now, it is not sure if we can relate on the MFU value.
\ No newline at end of file
Clone repository

GPT2 Parallelization and Porting

  • Model Description
  • Runtime and Performances
  • Fork Join Model
  • Task Based Model
  • Distributed Model
  • Various Informations