Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • S sdv-lammps
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 100
    • Issues 100
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • djurado
  • sdv-lammps
  • Wiki
  • Specialization

Specialization · Changes

Page history
Create Specialization authored Jan 17, 2023 by djurado's avatar djurado
Hide whitespace changes
Inline Side-by-side
Specialization.md 0 → 100644
View page @ 392ac1da
The end of `PairLJCharmmCoulLong::compute` contains function calls to `ev_tally` or `virial_fdotr_compute`. Whether these are run or not depends on the value or `evflag` and `vflag_fdotr` variables.
An analysis using GDB breakpoints showed that, for the protein input, these funcions are only called on the first and last timesteps of the execution.
This paraver trace shows the weight of the first and last iterations compared to the rest.
TODO Paraver trace with two levels of events, one for compute and other for inner function calls.
Having function calls inside the function to be optimized can be troublesome because:
1. The compiler does not support autovectorization of non-inlined functions.
2. If not autovectorized, the function would need to be vectorized with intrinsics, with all the additional work.
3. If kept serial, then a mechanism to unpack data from the vector registers would still be needed.
After considering this analysis, we decided on writing an specialized routine that only targets the case in which the functions are not called.
Now, there are two routines, `compute_loopi_original` and `compute_loopi_special`.
The specialized routine is called when possible, if not, execution falls back on the original function.
The last factor that has been taken into account with the specialization is the fact that the protein input spcript `in.protein` uses the form `pair_style lj/charmm/coul/long X Y` with only two parameters, which implies that `cut_ljsq = cut_coulsq`.
Targeting only this case for the specialization can lead to simpler code, altough it would fall back on the original function when used with three parameters.
For more information about the two and three parameter invokation, check the LAMMPS [documentation](https://docs.lammps.org/pair_charmm.html#pair-style-lj-charmm-coul-long-command).
\ No newline at end of file
Clone repository

Home

  1. Introduction
  2. Overview
  3. Implementation
    • Implementation
  4. Implementation

Sidebar