tchatela · 1cc0d749
--- a/Task-Based-Model.md
+++ b/Task-Based-Model.md
@@ -45,10 +45,17 @@ This lead us to the following performances :
 Here we tried to slice according to the token sequence only (BATCH_SUBSIZE=4), or according to the sequence token and the sequence batch (BATCH_SUBSIZE=1). The slope obtained for the second version seems better so we will keep it for the following tests.
 ![Comparison_BATCH_SUBSIZE](uploads/2680eec7bdf9875f9a7b79f21ea3f565/Comparison_BATCH_SUBSIZE.png)
-Because we have a taskwait between each call of the main training loop, it is possible to get the runtime, speedup and efficiency for the forward and backward pass independently.
+Because we have a taskwait between each call of the main training loop, it is possible to get the runtime, speedup and efficiency for the forward and backward pass independently. The following diagrams give us the insight that the application is memory bound.
 ![runtimeT](uploads/7e7c70a6bd6723d02927bebd614eb242/runtimeT.png)
 ![tbS](uploads/6f3833c1c74dbfa74eb3c6f78dd9cf2d/tbS.png)
 ![tbE](uploads/70660e5b4387ac75b7f78e79b8563ed2/tbE.png)
 We can compare now the efficiency between this version of the task based model and the fork join model
 ![cmptf](uploads/7a0dd4169473042b29639600f4c28095/cmptf.png)
+# Final implementation
+Now, we want to remove the taskloops put across the main training loop to improve tasks management and to allow the use of taskiter.  
+![Task_based_implementation_final](uploads/3551b24d36cbae27f861414cd2063322/Task_based_implementation_final.png)
\ No newline at end of file