... | @@ -30,15 +30,13 @@ In other words, `int **firstneigh` is a pointer to an array of pointers to array |
... | @@ -30,15 +30,13 @@ In other words, `int **firstneigh` is a pointer to an array of pointers to array |
|
`firstneigh[i]` contains an array of the atoms `j` that are neighbors of atom `i`.
|
|
`firstneigh[i]` contains an array of the atoms `j` that are neighbors of atom `i`.
|
|
In this protein input, there are 32.000 `i` atoms.
|
|
In this protein input, there are 32.000 `i` atoms.
|
|
For example, `firstneigh[i][0]` would be the first neighbor of atom `i`.
|
|
For example, `firstneigh[i][0]` would be the first neighbor of atom `i`.
|
|
Both `i` and are atoms `firstneigh[i][0]`, and are represented using a 32 bit `int`.
|
|
Both `i` and `j` are atom identifiers, and are represented using a 32 bit `int`.
|
|
To retrieve the properties of an atom, this `int` value needs to be used as an index for the arrays found in file `atom_vec.h` or `atom.h` such as `double **x` (position), `**f` (force) or `*q` (charge).
|
|
To retrieve the properties of an atom, this `int` value needs to be used as an index for the arrays found in file `atom_vec.h` or `atom.h` such as `double **x` (position), `**f` (force) or `*q` (charge).
|
|
`x` and `f` are pointer to pointer since the second index in needed to split the magnitudes in XYZ components.
|
|
`x` and `f` are pointer to pointer since the second index in needed to split the magnitudes in XYZ components.
|
|
|
|
|
|
Function `PairLJCharmmCoulLong::compute` has an inner and an outer for loop.
|
|
Function `PairLJCharmmCoulLong::compute` has an inner and an outer for loop.
|
|
The outer loop (i-loop) iterates through all 32.000 atoms in the protein simulation, while the inner loop (j-loop) iterates through each atom `j` that is a neighbor of `i`.
|
|
The outer loop (i-loop) iterates through all 32.000 atoms in the protein simulation, while the inner loop (j-loop) iterates through each atom `j` that is a neighbor of `i`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For each pair of atoms `i, j`, the algorithm first computes the distance between the two atoms.
|
|
For each pair of atoms `i, j`, the algorithm first computes the distance between the two atoms.
|
|
Then, the distance is compared to different values which act as a threshold.
|
|
Then, the distance is compared to different values which act as a threshold.
|
|
|
|
|
... | @@ -49,6 +47,7 @@ The alternative form with `X Y Z` parameters ([see](https://docs.lammps.org/pair |
... | @@ -49,6 +47,7 @@ The alternative form with `X Y Z` parameters ([see](https://docs.lammps.org/pair |
|
These values always follow:
|
|
These values always follow:
|
|
- `cut_lj_innersq < cut_ljsq`
|
|
- `cut_lj_innersq < cut_ljsq`
|
|
- `tabinnersq < cut_coulsq`
|
|
- `tabinnersq < cut_coulsq`
|
|
|
|
- `cut_ljsq = cut_coulsq` (only in optimized function)
|
|
- `cut_bothsq = MIN(cut_ljsq, cut_coulsq)`
|
|
- `cut_bothsq = MIN(cut_ljsq, cut_coulsq)`
|
|
|
|
|
|
In the code, `rsq` represents the distance between atoms `i,j`. It is saved in squared form to avoid computing an expensive `sqrt`.
|
|
In the code, `rsq` represents the distance between atoms `i,j`. It is saved in squared form to avoid computing an expensive `sqrt`.
|
... | @@ -56,11 +55,22 @@ In the code, `rsq` represents the distance between atoms `i,j`. It is saved in s |
... | @@ -56,11 +55,22 @@ In the code, `rsq` represents the distance between atoms `i,j`. It is saved in s |
|
- If `rsq` is smaller than `tabinnersq`, then `forcecoul` is computed using a *fast* table method. If not, it is computed using a *slow* method with calls to `sqrt` and `exp` functions.
|
|
- If `rsq` is smaller than `tabinnersq`, then `forcecoul` is computed using a *fast* table method. If not, it is computed using a *slow* method with calls to `sqrt` and `exp` functions.
|
|
- If `rsq` is bigger than `cut_lj_innersq`, then `forcelj` needs a few additional computations.
|
|
- If `rsq` is bigger than `cut_lj_innersq`, then `forcelj` needs a few additional computations.
|
|
|
|
|
|
|
|
The following flowchart shows the different cases based on `rsq`.
|
|
|
|
The condition on `forcelj`is not shown in order to focus on the `tabinnersq` condition.
|
|
|
|
|
|
![flowchart](uploads/2c10a18b3cda9496b9eaed9e54e001db/flowchart.png)
|
|
![flowchart](uploads/2c10a18b3cda9496b9eaed9e54e001db/flowchart.png)
|
|
|
|
|
|
This flowcharts shows the different cases based on `rsq`.
|
|
At the end, the code uses the computed `forcelj` and `forcecoul` values to update the `f` (force) values for both atoms `i` and `j`.
|
|
The condition on `forcelj`is not shown to focus on the `tabinnersq` condition.
|
|
|
|
|
|
# Implementation
|
|
|
|
|
|
|
|
In this section we discuss the implementation of the optimized `PairLJCharmmCoulLong::compute` function, with a focus on the issues that presented when adapting the code for the RISC-V 0.7 scalable vector extension and how have been overcame.
|
|
|
|
|
|
|
|
### Specialization
|
|
|
|
|
|
|
|
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`, which are run depending on the value or `evflag` and `vflag_fdotr` variables.
|
|
|
|
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
|
|
|
|
This paraver trace
|
|
|
|
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
atom_vec.h -> contains `**x` and `**f` (3D)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
|
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
|
... | | ... | |