djurado · a7aacce3
--- a/Home.md
+++ b/Home.md
@@ -51,7 +51,7 @@ These values always follow:
 - `cut_bothsq = MIN(cut_ljsq, cut_coulsq)`

 In the code, `rsq` represents the distance between atoms `i,j`. It is saved in squared form to avoid computing an expensive `sqrt`.
- If `rsq` is larger than `cut_bothsq`, then, no computation is required because there is no short-range interaction between the two atoms. In that case, the inner loop iteration stops here.
+- If `rsq` is larger than `cut_bothsq`, then, no computation is required because there is no short-range interaction between the two atoms. In that case, the inner loop iteration stops here (*do nothing*).
 - If `rsq` is smaller than `tabinnersq`, then `forcecoul` is computed using a *fast* table method. If not, it  is computed using a *slow* method with calls to `sqrt` and `exp` functions.
 - If `rsq` is bigger than `cut_lj_innersq`, then `forcelj` needs a few additional computations.

@@ -86,8 +86,48 @@ The specialized function is called when possible, and if not the execution falls
 Another factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
 Targeting only this case for the specialization this case for the optimization can lead to simpler code.

+### Loop size
+
+When vectorizing a loop, it is important to consider the number of loop iterations.
+It may not be worth vectorizing a loop that features a very small iteration count with underusing of the vector registers, since it may prove slower than the serial version.
+
+`PairLJCharmmCoulLong::compute` deals with interactions of pair of atoms `i,j`.
+The outer loop that traverses through the 32000 `i` atoms in the protein input.
+For each `i` atom, the inner loop traverses through atoms `j` that are neighbors of `i`.
+Moreover, each `i` atom can have a different amount of neighbors (`numneigh[i]`).
+
+Our optimization targets the vectorization of the inner loop, so its iteration count will determine if the loop is worth vectorizing.
+After modifying the code to print the number of inner loop iterations, we found that on average, the inner loop contains 375 iterations.
+Considering that registers in the 0.7 vector unit can hold up to 256 64 bit elements, the loop is suitable for vectorizing iteration count wise.
+
+One can increase the number of iterations in the inner loop by increasing the neighbor distance threshold.
+This neighbor threshold is set automatically according to the interaction distance thresholds specified in the `pair_style lj/charmm/coul/long X Y` command.
+The largest interaction distance accepted by LAMMPS produced an average of 1290 inner loop iterations.
+It may be interesting to do some tests to see if the performance improves with higher inner  loop iteration counts.
+
+| input line | inner loop avg. iterations |
+| ---------- | -------------------------- |
+| `pair_style lj/charmm/coul/long 8.0 10.0` | 375 |
+| `pair_style lj/charmm/coul/long 8.0 16.1` | 1290|
+
+
 ### Managing different code paths

+In *Overview of Algorithm and Data structures* we presented the different code paths in the function.
+It is important to consider carefully the implications of this when vectorizing.
+Vectorization is based on SIMD processing (single instruction, multiple data), but different code paths require different instructions.
+With the RISC-V vector extension, this can be overcame with the help of masked instructions, which allows restricting writing the result of a vector instructions to only certain elements using a bitmask.
+
+For instance, which proportion of the atom pairs belong to the *do nothing* group?
+Even when using masked instructions to avoid updating *do nothing* atoms, instructions take some time to execute.
+So, as opposed to the serial version, a *do nothing* atom has the same cost in time as any other atom in the vectorized version with masked instructions.
+
+Before starting working on the vectorization, the code de was modified to count the number of atoms that belong to each category.
+The flowchart shows in black, the average number number of 
+
+
+### Managing 32 bit and 64 data types
+
 atom_vec.h -> contains `**x` and `**f` (3D)
 neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
 PairLJCharmmCoulLong::settings ->