Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • S sdv-lammps
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 100
    • Issues 100
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • djurado
  • sdv-lammps
  • Wiki
  • Home

Home · Changes

Page history
Update Home authored Jan 10, 2023 by djurado's avatar djurado
Hide whitespace changes
Inline Side-by-side
Home.md
View page @ a7aacce3
......@@ -51,7 +51,7 @@ These values always follow:
- `cut_bothsq = MIN(cut_ljsq, cut_coulsq)`
In the code, `rsq` represents the distance between atoms `i,j`. It is saved in squared form to avoid computing an expensive `sqrt`.
- If `rsq` is larger than `cut_bothsq`, then, no computation is required because there is no short-range interaction between the two atoms. In that case, the inner loop iteration stops here.
- If `rsq` is larger than `cut_bothsq`, then, no computation is required because there is no short-range interaction between the two atoms. In that case, the inner loop iteration stops here (*do nothing*).
- If `rsq` is smaller than `tabinnersq`, then `forcecoul` is computed using a *fast* table method. If not, it is computed using a *slow* method with calls to `sqrt` and `exp` functions.
- If `rsq` is bigger than `cut_lj_innersq`, then `forcelj` needs a few additional computations.
......@@ -86,8 +86,48 @@ The specialized function is called when possible, and if not the execution falls
Another factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
Targeting only this case for the specialization this case for the optimization can lead to simpler code.
### Loop size
When vectorizing a loop, it is important to consider the number of loop iterations.
It may not be worth vectorizing a loop that features a very small iteration count with underusing of the vector registers, since it may prove slower than the serial version.
`PairLJCharmmCoulLong::compute` deals with interactions of pair of atoms `i,j`.
The outer loop that traverses through the 32000 `i` atoms in the protein input.
For each `i` atom, the inner loop traverses through atoms `j` that are neighbors of `i`.
Moreover, each `i` atom can have a different amount of neighbors (`numneigh[i]`).
Our optimization targets the vectorization of the inner loop, so its iteration count will determine if the loop is worth vectorizing.
After modifying the code to print the number of inner loop iterations, we found that on average, the inner loop contains 375 iterations.
Considering that registers in the 0.7 vector unit can hold up to 256 64 bit elements, the loop is suitable for vectorizing iteration count wise.
One can increase the number of iterations in the inner loop by increasing the neighbor distance threshold.
This neighbor threshold is set automatically according to the interaction distance thresholds specified in the `pair_style lj/charmm/coul/long X Y` command.
The largest interaction distance accepted by LAMMPS produced an average of 1290 inner loop iterations.
It may be interesting to do some tests to see if the performance improves with higher inner loop iteration counts.
| input line | inner loop avg. iterations |
| ---------- | -------------------------- |
| `pair_style lj/charmm/coul/long 8.0 10.0` | 375 |
| `pair_style lj/charmm/coul/long 8.0 16.1` | 1290|
### Managing different code paths
In *Overview of Algorithm and Data structures* we presented the different code paths in the function.
It is important to consider carefully the implications of this when vectorizing.
Vectorization is based on SIMD processing (single instruction, multiple data), but different code paths require different instructions.
With the RISC-V vector extension, this can be overcame with the help of masked instructions, which allows restricting writing the result of a vector instructions to only certain elements using a bitmask.
For instance, which proportion of the atom pairs belong to the *do nothing* group?
Even when using masked instructions to avoid updating *do nothing* atoms, instructions take some time to execute.
So, as opposed to the serial version, a *do nothing* atom has the same cost in time as any other atom in the vectorized version with masked instructions.
Before starting working on the vectorization, the code de was modified to count the number of atoms that belong to each category.
The flowchart shows in black, the average number number of
### Managing 32 bit and 64 data types
atom_vec.h -> contains `**x` and `**f` (3D)
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
PairLJCharmmCoulLong::settings ->
......
Clone repository
  • 32 bit and 64 bit data types
  • 32 bit to 64 bit
  • Home
  • Implementation
  • Loop size
  • Managing code paths
  • Overview of Algorithm and Data structures
  • Specialization
  • _sidebar
  • union_int_float_t