... | @@ -16,27 +16,6 @@ The input contains two files: |
... | @@ -16,27 +16,6 @@ The input contains two files: |
|
- `pair_style lj/charmm/coul/long 8.0 10.0` This is the line that specifies that we are using the *lj/charmm/coul/charmm* *pair_style*. If a different *pair_style* was selected, then `PairLJCharmmCoulLong::compute` would not be executed, and the optimizations would not have any impact. Moreover, the `8.0`and `10.0` represent the *cutoff distances*, which can have an impact to the execution of the function. For more information, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
|
|
- `pair_style lj/charmm/coul/long 8.0 10.0` This is the line that specifies that we are using the *lj/charmm/coul/charmm* *pair_style*. If a different *pair_style* was selected, then `PairLJCharmmCoulLong::compute` would not be executed, and the optimizations would not have any impact. Moreover, the `8.0`and `10.0` represent the *cutoff distances*, which can have an impact to the execution of the function. For more information, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
|
|
- `data.protein`: contains the initial data for the atoms in the simulation and their properties
|
|
- `data.protein`: contains the initial data for the atoms in the simulation and their properties
|
|
|
|
|
|
### Specialization
|
|
|
|
|
|
|
|
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`. Whether these are run or not depends on the value or `evflag` and `vflag_fdotr` variables.
|
|
|
|
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
|
|
|
|
This paraver trace shows the weight of the first and last iterations compared to the rest.
|
|
|
|
|
|
|
|
TODO Paraver trace with two levels of events, one for compute and other for inner function calls.
|
|
|
|
|
|
|
|
Having function calls inside the function to be optimized can be troublesome because:
|
|
|
|
1. The compiler does not support autovectorization of non-inlined functions.
|
|
|
|
2. If not autovectorized, the function would need to be vectorized with intrinsics, with all the additional work.
|
|
|
|
3. If kept serial, then a mechanism to unpack data from the vector registers would still be needed.
|
|
|
|
|
|
|
|
After considering this analysis, we decided on writing an specialized routine that only targets the case in which the functions are not called.
|
|
|
|
Now, there are two routines, `compute_loopi_original` and `compute_loopi_special`.
|
|
|
|
The specialized routine is called when possible, if not, execution falls back on the original function.
|
|
|
|
|
|
|
|
The last factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
|
|
|
|
Targeting only this case for the specialization can lead to simpler code, altough it would fall back on the original function when used with three parameters.
|
|
|
|
For more information about the two and three parameter invokation, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
|
|
|
|
|
|
|
|
### Loop size
|
|
### Loop size
|
|
|
|
|
|
When vectorizing a loop, it is important to consider the number of loop iterations.
|
|
When vectorizing a loop, it is important to consider the number of loop iterations.
|
... | | ... | |