|
|
When vectorizing a loop, it is important to consider the number of loop iterations.
|
|
|
It may not be worth vectorizing a loop that features a very small iteration count, since it can lead to underusing of the vector registers and may prove slower than the serial version.
|
|
|
|
|
|
`PairLJCharmmCoulLong::compute` deals with interactions of pair of atoms `i,j`.
|
|
|
The outer loop that traverses through the 32000 `i` atoms in the protein input.
|
|
|
For each `i` atom, the inner loop traverses through atoms `j` that are neighbors of `i`.
|
|
|
Moreover, each `i` atom can have a different amount of neighbors (`numneigh[i]`).
|
|
|
|
|
|
Our optimization targets the vectorization of the inner loop, so its iteration count will determine if the loop is worth vectorizing.
|
|
|
After modifying the code to print the number of inner loop iterations, we found that on average, the inner loop contains 375 iterations.
|
|
|
Considering that registers in the 0.7 vector unit can hold up to 256 64-bit elements, the loop can be considered suitable for vectorizing iteration count wise.
|
|
|
|
|
|
One can increase the number of iterations in the inner loop by increasing the neighbor distance threshold.
|
|
|
This neighbor threshold is set automatically according to the interaction distance thresholds specified in the `pair_style lj/charmm/coul/long X Y` command.
|
|
|
The largest interaction distance accepted by LAMMPS produced an average of 1290 inner loop iterations.
|
|
|
It may be interesting to do some tests to see how higher inner loop iteration counts affect performance when comparing with the serial version.
|
|
|
|
|
|
| input line | inner loop avg. iterations |
|
|
|
| ---------- | -------------------------- |
|
|
|
| `pair_style lj/charmm/coul/long 8.0 10.0` | 375 |
|
|
|
| `pair_style lj/charmm/coul/long 8.0 16.1` | 1290| |
|
|
\ No newline at end of file |