tchatela · 156094eb
--- a/Various-informations.md
+++ b/Various-informations.md
@@ -53,3 +53,4 @@ In the end, I managed to get this slicing to give correct results, but I had sti
 - For the dataloader, my strategy is to load the tokens on one rank and to scatter them. In the GPU version, each token is loading its tokens. I have chosen this scattering method as it is more simple for now to change the scattering shapes.
 - When B*T is large, you must use tinystories as dataset 
 - I do not have implemented mini-batches, and this may be one of the first thing to do now. However it should be very simple to do.
+- I have benchmarked the application here https://docs.google.com/spreadsheets/d/1uS5uPAVtFLoj4BvirT4mke5_pVDsOm85ciNP-0iIO5M/edit?usp=sharing and I will continue benchmarking it in the next weeks for my report