|
|
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`. Whether these are run or not depends on the value or `evflag` and `vflag_fdotr` variables.
|
|
|
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
|
|
|
This paraver trace shows the weight of the first and last iterations compared to the rest.
|
|
|
|
|
|
TODO Paraver trace with two levels of events, one for compute and other for inner function calls.
|
|
|
|
|
|
Having function calls inside the function to be optimized can be troublesome because:
|
|
|
1. The compiler does not support autovectorization of non-inlined functions.
|
|
|
2. If not autovectorized, the function would need to be vectorized with intrinsics, with all the additional work.
|
|
|
3. If kept serial, then a mechanism to unpack data from the vector registers would still be needed.
|
|
|
|
|
|
After considering this analysis, we decided on writing an specialized routine that only targets the case in which the functions are not called.
|
|
|
Now, there are two routines, `compute_loopi_original` and `compute_loopi_special`.
|
|
|
The specialized routine is called when possible, if not, execution falls back on the original function.
|
|
|
|
|
|
The last factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
|
|
|
Targeting only this case for the specialization can lead to simpler code, altough it would fall back on the original function when used with three parameters.
|
|
|
For more information about the two and three parameter invokation, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command). |
|
|
\ No newline at end of file |