... | @@ -21,14 +21,19 @@ If a pair of atoms is "close" (within the *pair_style* cutoff distance), it prod |
... | @@ -21,14 +21,19 @@ If a pair of atoms is "close" (within the *pair_style* cutoff distance), it prod |
|
If atoms aren't within the cutoff distance, these become *long-range* interactions in "reciprocal space" (FFT domain).
|
|
If atoms aren't within the cutoff distance, these become *long-range* interactions in "reciprocal space" (FFT domain).
|
|
The `PairLJCharmmCoulLong::compute` computes short-range interactions.
|
|
The `PairLJCharmmCoulLong::compute` computes short-range interactions.
|
|
|
|
|
|
- Specialization: calls to subroutines inside compute only happen on the first and last iteration
|
|
- **Specialization**: calls to subroutines inside compute only happen on the first and last iteration
|
|
- Data structures: in which classes is the information about atoms (position, force) stored, how? array of pointer to array
|
|
- Data structures: in which classes is the information about atoms (position, force) stored, how? array of pointer to array
|
|
- Structure of the code - present the flowchart, show the different code paths: do-nothing, fast, slow...
|
|
- **Structure of the code** - present the flowchart, show the different code paths: do-nothing, fast, slow...
|
|
- Abandoned idea: copied from the INTEL version - the "classify-loop" - store elements that can be computed vectorially in a buffer (in serial beacuse 0.7) and then process them vectorially.
|
|
- Abandoned idea: copied from the INTEL version - the "classify-loop" - store elements that can be computed vectorially in a buffer (in serial beacuse 0.7) and then process them vectorially.
|
|
- Implemented idea: use a combination of masked operations form elements that do not be processed and using a vmfirst loop to find the elements that need to be processed in serial
|
|
- Implemented idea: use a combination of masked operations form elements that do not be processed and using a vmfirst loop to find the elements that need to be processed in serial
|
|
- Problems
|
|
- **Problems**
|
|
- pointer to pointer - often requires two load indexed operations
|
|
- pointer to pointer - often requires two load indexed operations
|
|
- int32 to int64: tested approaches
|
|
- int32 to int64: tested approaches
|
|
- not available in 0.7: fixed sew load
|
|
- not available in 0.7: fixed sew load
|
|
- widening instruction: cannot be used with intrinsics (it is compliant with the RISC-V specification, not VPU)
|
|
- widening instruction: cannot be used with intrinsics (it is compliant with the RISC-V specification, not VPU)
|
|
- widening instructions + inline asm - register overlapping restrictions and placement is not well implemented - can compile and place and instruction that will give an execution error. |
|
- widening instructions + inline asm - register overlapping restrictions and placement is not well implemented - can compile and place and instruction that will give an execution error - not stable enough
|
|
\ No newline at end of file |
|
- bithack approach (the inital suboptimal version)
|
|
|
|
- the bithack approach requires loading 32 bit elements with register SEW 64 bits (2 elements per SEW). This can produce unaligned access error (in fpga-sdv, nut not in arriesgado+vehave)
|
|
|
|
- To overcome this, and to perform a casting from int to float, a vmv.v inline asm needs to be used (just to trick the compiler)
|
|
|
|
- the 64bit mask: there is a part in the code that casts the **binary representation** of a 32 bit float to 32 bit integer and applies some bitmasks, that are generated during the execution. The generation has been ported to 64 bits , and a vmv.v is needed to trick the compiler from moving from float to int.
|
|
|
|
|