|
|
*
|
|
|
|
|
|
# Introduction
|
|
|
|
|
|
LAMMPS is a classical molecular dynamics simulation code with a focus in materials modeling.
|
... | ... | @@ -16,8 +14,21 @@ The input contains two files: |
|
|
- `pair_style lj/charmm/coul/long 8.0 10.0` This is the line that specifies that we are using the *lj/charmm/coul/charmm* *pair_style*. If a different *pair_style* was selected, then `PairLJCharmmCoulLong::compute` would not be executed, and the optimizations would not have any impact. Moreover, the `8.0`and `10.0` represent the *cutoff distances*, which can have an impact to the execution of the function. For more information, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
|
|
|
- `data.protein`: contains the initial data for the atoms in the simulation and their properties
|
|
|
|
|
|
# Algorithm
|
|
|
|
|
|
LAMMPS computes interactions between atoms.
|
|
|
If the atoms are "close" (within the *pair_style* cutoff distance), these are *short-range* interactions in "real space" and are computed using a *pair_style*.
|
|
|
If the atoms aren't within the cutoff distance, these are *long-range* interactions in "reciprocal space" (FFT domain).
|
|
|
If a pair of atoms is "close" (within the *pair_style* cutoff distance), it produces a *short-range* interaction in "real space" and is computed using a *pair_style*.
|
|
|
If atoms aren't within the cutoff distance, these become *long-range* interactions in "reciprocal space" (FFT domain).
|
|
|
The `PairLJCharmmCoulLong::compute` computes short-range interactions.
|
|
|
|
|
|
- Specialization: calls to subroutines inside compute only happen on the first and last iteration
|
|
|
- Data structures: in which classes is the information about atoms (position, force) stored, how? array of pointer to array
|
|
|
- Structure of the code - present the flowchart, show the different code paths: do-nothing, fast, slow...
|
|
|
- Abandoned idea: copied from the INTEL version - the "classify-loop" - store elements that can be computed vectorially in a buffer (in serial beacuse 0.7) and then process them vectorially.
|
|
|
- Implemented idea: use a combination of masked operations form elements that do not be processed and using a vmfirst loop to find the elements that need to be processed in serial
|
|
|
- Problems
|
|
|
- pointer to pointer - often requires two load indexed operations
|
|
|
- int32 to int64: tested approaches
|
|
|
- not available in 0.7: fixed sew load
|
|
|
- widening instruction: cannot be used with intrinsics (it is compliant with the RISC-V specification, not VPU)
|
|
|
- widening instructions + inline asm - register overlapping restrictions and placement is not well implemented - can compile and place and instruction that will give an execution error. |
|
|
\ No newline at end of file |