Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • S sdv-lammps
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 100
    • Issues 100
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • djurado
  • sdv-lammps
  • Wiki
  • Managing code paths

Managing code paths · Changes

Page history
Create Managing code paths authored Jan 17, 2023 by djurado's avatar djurado
Show whitespace changes
Inline Side-by-side
Managing-code-paths.md 0 → 100644
View page @ aa99a427
In *Overview of Algorithm and Data structures* we presented the different code paths in the function.
It is important to consider carefully the implications of this when vectorizing.
Vectorization is based on SIMD processing (single instruction, multiple data), but different code paths require different instructions.
With the RISC-V vector extension, this can be overcame with the help of masked instructions, which allows restricting writing the result of a vector instructions to only certain elements using a bitmask.
For instance, which proportion of the atom pair interactions (or inner loop iterations) belong to the *do nothing* group?
Even when using masked instructions, we can avoid updating data for *do nothing* interactions, but the execution time required for processing this data cannot be avoided.
So, as opposed to the serial version, a *do nothing* interaction has the same cost in time as any other atom in the vectorized version with masked instructions.
Before starting working on the vectorization, the code was modified to count the number of interactions that belong to each category.
The flowchart shows the average number number of interactions (for a single `i` atom in a timestep) that belong to each category, and the arrows show the same information in percentage form.
Black values show data for the default protein input, while red values correspond to the modified input described in section *Loop size*.
We can see how the proportion of "do nothing" elements in the regular input is about 42%.
We deemed to extract the not "do-nothing" elements would be too costly, since the proportion is too high, and the accelerator lacks the `vcompress` [instruction](https://github.com/riscv/riscv-v-spec/blob/0.7.1/v-spec.adoc#176-vector-compress-instruction) that implements this (see [ISA support](https://repo.hca.bsc.es/gitlab/EPI/RTL/Vector_Accelerator/-/wikis/VPU/ISA-support)).
For this reason, we decided to use the masking approach, even if it makes "do nothing" elements as slow as the rest.
This type of "masking" approach is not suitable for the elements labeled as "slow" (the ones involving `sqrt` and `exp`), since all elements would need a computation time of "slow" and "fast" combined.
The fact that there are so few "slow" elements (around 0.3%) makes it feasible to try to use the "vextract" method.
Since the instruction is unavailable, we used a loop of `vmfirst` in order to mask the "slow" elements in the vector register and update them separately using the serial function `compute_iterj_special`.
The modified input manages to reduce the proportion of interactions that belong to the *do nothing* and *slow* categories.
It may be interesting to test how the modified input affects performance in both serial and vectorized versions.
Clone repository

Home

  1. Introduction
  2. Overview
  3. Implementation
    • Specialization
  4. Implementation

Sidebar