... | @@ -103,7 +103,7 @@ Considering that registers in the 0.7 vector unit can hold up to 256 64 bit elem |
... | @@ -103,7 +103,7 @@ Considering that registers in the 0.7 vector unit can hold up to 256 64 bit elem |
|
One can increase the number of iterations in the inner loop by increasing the neighbor distance threshold.
|
|
One can increase the number of iterations in the inner loop by increasing the neighbor distance threshold.
|
|
This neighbor threshold is set automatically according to the interaction distance thresholds specified in the `pair_style lj/charmm/coul/long X Y` command.
|
|
This neighbor threshold is set automatically according to the interaction distance thresholds specified in the `pair_style lj/charmm/coul/long X Y` command.
|
|
The largest interaction distance accepted by LAMMPS produced an average of 1290 inner loop iterations.
|
|
The largest interaction distance accepted by LAMMPS produced an average of 1290 inner loop iterations.
|
|
It may be interesting to do some tests to see if the performance improves with higher inner loop iteration counts.
|
|
It may be interesting to do some tests to see how higher inner loop iteration counts affect performance when comparing with the serial version.
|
|
|
|
|
|
| input line | inner loop avg. iterations |
|
|
| input line | inner loop avg. iterations |
|
|
| ---------- | -------------------------- |
|
|
| ---------- | -------------------------- |
|
... | @@ -119,17 +119,20 @@ Vectorization is based on SIMD processing (single instruction, multiple data), b |
... | @@ -119,17 +119,20 @@ Vectorization is based on SIMD processing (single instruction, multiple data), b |
|
With the RISC-V vector extension, this can be overcame with the help of masked instructions, which allows restricting writing the result of a vector instructions to only certain elements using a bitmask.
|
|
With the RISC-V vector extension, this can be overcame with the help of masked instructions, which allows restricting writing the result of a vector instructions to only certain elements using a bitmask.
|
|
|
|
|
|
For instance, which proportion of the atom pair interactions (or inner loop iterations) belong to the *do nothing* group?
|
|
For instance, which proportion of the atom pair interactions (or inner loop iterations) belong to the *do nothing* group?
|
|
Even when using masked instructions to avoid updating *do nothing* itneractions, instructions take some time to execute.
|
|
Even when using masked instructions to avoid updating *do nothing* interactions, instructions take some time to execute.
|
|
So, as opposed to the serial version, a *do nothing* interactions has the same cost in time as any other atom in the vectorized version with masked instructions.
|
|
So, as opposed to the serial version, a *do nothing* interactions has the same cost in time as any other atom in the vectorized version with masked instructions.
|
|
|
|
|
|
Before starting working on the vectorization, the code was modified to count the number of interactions that belong to each category.
|
|
Before starting working on the vectorization, the code was modified to count the number of interactions that belong to each category.
|
|
The flowchart shows the average number number of interactions (for a single `i` atom in a timestep) that belong to each category, and the arrows show the same information in percentage form.
|
|
The flowchart shows the average number number of interactions (for a single `i` atom in a timestep) that belong to each category, and the arrows show the same information in percentage form.
|
|
Black values show data for the default protein input, while red values correspond to the modified input described in section
|
|
Black values show data for the default protein input, while red values correspond to the modified input described in section *Loop size*.
|
|
|
|
|
|
|
|
|
|
|
|
The modified input manages to reduce the proportion of interactions that belong to the *do nothing* and *slow* categories.
|
|
|
|
It may be interesting to test how the performance of the modified input affects performance compared to the serial version.
|
|
|
|
|
|
### Managing 32 bit and 64 data types
|
|
### Managing 32 bit and 64 data types
|
|
|
|
|
|
|
|
|
|
|
|
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
|
PairLJCharmmCoulLong::settings ->
|
|
PairLJCharmmCoulLong::settings ->
|
... | | ... | |