... | @@ -70,7 +70,23 @@ In this section we discuss the implementation of the optimized `PairLJCharmmCoul |
... | @@ -70,7 +70,23 @@ In this section we discuss the implementation of the optimized `PairLJCharmmCoul |
|
|
|
|
|
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`, which are run depending on the value or `evflag` and `vflag_fdotr` variables.
|
|
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`, which are run depending on the value or `evflag` and `vflag_fdotr` variables.
|
|
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
|
|
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
|
|
This paraver trace
|
|
This paraver trace shows the weight of the first and last iterations compared to the rest.
|
|
|
|
|
|
|
|
TODO Paraver trace with two levels of events, one for compute and other for inner function calls.
|
|
|
|
|
|
|
|
Having function calls inside the function to be optimized can be troublesome because:
|
|
|
|
1. The compiler does not support autovectorization of non-inlined functions.
|
|
|
|
2. If not autovectorized, the function would need to be vectorized with intrinsics, with all the additional work.
|
|
|
|
3. If kept serial, then a mechanism to unpack data from the vector registers would still be needed.
|
|
|
|
|
|
|
|
After considering this, we decided that the specialized function should only target the case in which the functions are not called.
|
|
|
|
Now, there are two funcions, `compute_loopi_original` and `compute_loopi_special`.
|
|
|
|
The specialized function is called when possible, and if not the execution falls back on the original function.
|
|
|
|
|
|
|
|
Another factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
|
|
|
|
Targeting only this case for the specialization this case for the optimization can lead to simpler code.
|
|
|
|
|
|
|
|
### Managing different code paths
|
|
|
|
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
... | | ... | |