tchatela · ee121d8b
--- a/Metrics.md
+++ b/Metrics.md
+We use a total of 6 metrics to evaluate the model :
+- Runtime per iteration (runtime to calculate forward + reset grads + backward + update)
+- tokens/s : Number of computed tokens per second (=B*T/runtime_per_iteration)
+- tokens/(s.cpus) : Number of computed tokens per second per CPU (=(token/s)/num_cpu))
+- Loss : Training loss of the model
+- MFU : Model flop utilization (application GFLOPS / MN5 GFLOPS)
+
+As long as we are parallelizing the model, you should always get the same train losses as the model is deterministic.
+The GPU uses the same metrics so we can compare the token/s metrics.
+For now, it is not sure if we can relate on the MFU value.
\ No newline at end of file