tchatela · 6d99a987
--- a/Distributed-Model.md
+++ b/Distributed-Model.md
@@ -108,23 +108,12 @@ Comparison of the update phase, without considering task overlapping. We will me
 | Strategy 3 | Not tested | Not tested |
 Note that the strategy 3 is the one used in the GPU version. However, it is much simpler for us to use the strategy 2 with the task-based implementation.
+From now, we will use strategy 2.
-# Multi-reduce + broadcast communications
+# Benchmarking 
-Some traces, with 4 ranks, 1 rank/socket, B = 4, T=1024
-![RC_legend](uploads/74660146f17bc0d52b76d2d48dd95455/RC_legend.png)
-![RC_implementation](uploads/12ae0ed35cb016331ce4da4a3f7d9aac/RC_implementation.png)
-![iteration_mpi](uploads/7202d2550d704ad38596b59207ea9c92/iteration_mpi.png)
-![iteration_mpi.code_legend](uploads/f5c599b62dfea7cfd3060f857889d9d1/iteration_mpi.code_legend.png)
-![mpi_comms](uploads/cc578008512fec4a6c810fd4936d588f/mpi_comms.png)
-![mpi_comms.code_legend](uploads/b2a0c1e1c7f98529751cbe14084768aa/mpi_comms.code_legend.png)
+## Increasing the number of ranks
 Communication cost with number of ranks (1rank/socket, B=4*worldsize, T=1024) : 
 | Number of ranks | Communication time from last to first | Communication time from first to last |
@@ -137,8 +126,6 @@ Communication cost with number of ranks (1rank/socket, B=4*worldsize, T=1024) :
 As you can see, we have a better communication time with 4 ranks. This is because we can adjust the communication block's sizes. Currently, the size have been manually adjusted for 4 ranks, explaining why the best communications time have been set for 4 ranks.
-# Increasing the number of ranks
 With B=16, T=1024, 1 rank/socket
 | Number of ranks | time/iteration | tokens/s | tokens/(s.cpus) |
@@ -150,7 +137,7 @@ With B=16, T=1024, 1 rank/socket
 | 16 | 4084 ms | 4011 | 4.48 |
-# Increasing the batch size 
+## Increasing the batch size 
 4 ranks, 1 rank/socket, T=1024