djurado · d64b8daf
--- a/Home.md
+++ b/Home.md
@@ -70,7 +70,23 @@ In this section we discuss the implementation of the optimized `PairLJCharmmCoul

 The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`, which are run depending on the value or `evflag` and `vflag_fdotr` variables.
 An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
-This paraver trace 
+This paraver trace shows the weight of the first and last iterations compared to the rest.
+
+TODO Paraver trace with two levels of events, one for compute and other for inner function calls.
+
+Having function calls inside the function to be optimized can be troublesome because:
+1. The compiler does not support autovectorization of non-inlined functions.
+2. If not autovectorized, the function would need to be vectorized with intrinsics, with all the additional work.
+3. If kept serial, then a mechanism to unpack data from the vector registers would still be needed.
+
+After considering this, we decided that the specialized function should only target the case in which the functions are not called.
+Now, there are two funcions, `compute_loopi_original` and `compute_loopi_special`.
+The specialized function is called when possible, and if not the execution falls back on the original function.
+
+Another factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
+Targeting only this case for the specialization this case for the optimization can lead to simpler code.
+
+### Managing different code paths

 atom_vec.h -> contains `**x` and `**f` (3D)
 neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)