Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • S sdv-lammps
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 100
    • Issues 100
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • djurado
  • sdv-lammps
  • Wiki
  • Home

Home · Changes

Page history
Remove notes from Home authored Jan 19, 2023 by djurado's avatar djurado
Hide whitespace changes
Inline Side-by-side
Home.md
View page @ 80758ec2
...@@ -15,28 +15,4 @@ The input contains two files: ...@@ -15,28 +15,4 @@ The input contains two files:
- `in.protein`: the "script" file, which contains the simulation settings. - `in.protein`: the "script" file, which contains the simulation settings.
- `run 100`: change 100 for *N* to run *N* timesteps. - `run 100`: change 100 for *N* to run *N* timesteps.
- `pair_style lj/charmm/coul/long 8.0 10.0` This is the line that specifies that we are using the *lj/charmm/coul/charmm* *pair_style*. If a different *pair_style* was selected, then `PairLJCharmmCoulLong::compute` would not be executed, and the optimizations would not have any impact. Moreover, the `8.0`and `10.0` represent the *cutoff distances*, which can have an impact to the execution of the function. For more information, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command). - `pair_style lj/charmm/coul/long 8.0 10.0` This is the line that specifies that we are using the *lj/charmm/coul/charmm* *pair_style*. If a different *pair_style* was selected, then `PairLJCharmmCoulLong::compute` would not be executed, and the optimizations would not have any impact. Moreover, the `8.0`and `10.0` represent the *cutoff distances*, which can have an impact to the execution of the function. For more information, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
- `data.protein`: contains the initial data for the atoms in the simulation and their properties - `data.protein`: contains the initial data for the atoms in the simulation and their properties
\ No newline at end of file
# Notes
atom_vec.h -> contains `**x` and `**f` (3D)
neigh_list.h -> contains `**firstneigh` (for each i, store array of neighbors j)
PairLJCharmmCoulLong::settings ->
- **Specialization**: calls to subroutines inside compute only happen on the first and last iteration
- Data structures: in which classes is the information about atoms (position, force) stored, how? array of pointer to array
- **Structure of the code** - present the flowchart, show the different code paths: do-nothing, fast, slow...
- Abandoned idea: copied from the INTEL version - the "classify-loop" - store elements that can be computed vectorially in a buffer (in serial because 0.7) and then process them vectorially.
- Implemented idea: use a combination of masked operations form elements that do not be processed and using a vmfirst loop to find the elements that need to be processed in serial
- **Problems**
- pointer to pointer - often requires two load indexed operations
- int32 to int64: tested approaches
- not available in 0.7: fixed sew load
- widening instruction: cannot be used with intrinsics (it is compliant with the RISC-V specification, not VPU)
- widening instructions + inline asm - register overlapping restrictions and placement is not well implemented - can compile and place and instruction that will give an execution error - not stable enough
- bithack approach (the inital suboptimal version)
- the bithack approach requires loading 32 bit elements with register SEW 64 bits (2 elements per SEW). This can produce unaligned access error (in fpga-sdv, nut not in arriesgado+vehave)
- To overcome this, and to perform a casting from int to float, a vmv.v inline asm needs to be used (just to trick the compiler)
- the 64bit mask: there is a part in the code that casts the **binary representation** of a 32 bit float to 32 bit integer and applies some bitmasks, that are generated during the execution. The generation has been ported to 64 bits , and a vmv.v is needed to trick the compiler from moving from float to int.
Clone repository

Home

  1. Introduction
  2. Overview
  3. Implementation
    • Specialization
    • Loop-size
    • Managing-code-paths
    • 32-bit and 64-bit data types
    • union_int_float_t
  4. Annex
    • 32-bit to 64-bit
    • Autovectorization

Sidebar