@@ -81,11 +81,11 @@ We deemed to extract the not "do-nothing" elements would be too costly, since th
...
@@ -81,11 +81,11 @@ We deemed to extract the not "do-nothing" elements would be too costly, since th
For this reason, we decided to use the masking approach, even if it makes "do nothing" elements as slow as the rest.
For this reason, we decided to use the masking approach, even if it makes "do nothing" elements as slow as the rest.
This type of "masking" approach is not suitable for the elements labeled as "slow" (the ones involving `sqrt` and `exp`), since all elements would need a computation time of "slow" and "fast" combined.
This type of "masking" approach is not suitable for the elements labeled as "slow" (the ones involving `sqrt` and `exp`), since all elements would need a computation time of "slow" and "fast" combined.
The fact that there are so few "slow" elements (around 0.3%) makes it possible to try to use the "vextract" method.
The fact that there are so few "slow" elements (around 0.3%) makes it feasible to try to use the "vextract" method.
Since the instruction is unavailable, we used a loop of `
Since the instruction is unavailable, we used a loop of `vmfirst` in order to mask the "slow" elements in the vector register and update them separately using the serial function `compute_iterj_special`.
The modified input manages to reduce the proportion of interactions that belong to the *do nothing* and *slow* categories.
It may be interesting to test how the modified input affects performance compared to the serial version.
The modified input manages to reduce the proportion of interactions that belong to the *do nothing* and *slow* categories.
It may be interesting to test how the modified input affects performance in both serial and vectorized versions.