... | ... | @@ -17,10 +17,10 @@ This is the source of a conflict, since the intrinsics are not aware of this par |
|
|
For this reason, using `vwadd` with intrinsics in the 0.7 VPU is not possible since just using a `__epi_2xi64` data type triggers an error because of the lack of LMUL support.
|
|
|
|
|
|
To overcome this, the proposed solution was to try to use `vwadd` with inline assembler to avoid the previous conflict, since inline assembler is not aware of register data types, but that was deemed as too inconsistent.
|
|
|
For the particular implementation of `vwadd` in the VPU, output is written to two consecutive registers, altough only the first one is specified as an operand.
|
|
|
For the particular implementation of `vwadd` in the VPU, output is written to two consecutive registers, although only the first one is specified as an operand.
|
|
|
This can lead to compilation errors, since the inline assembler is not aware of this and may automatically choose a combination of input and output registers that overlap, and generate and error when assembling the instructions.
|
|
|
Sometimes the compiler produces this error, but changing the optimization setting (from -O0 to -O2) can fix the issue since a different combination of registers may be used which happen to not overlap.
|
|
|
For this reason, this approach has been discarded.
|
|
|
Compiling with `-O0` produces this error, but using another optimization level fixes the issue since a different combination of registers which happen to not overlap may be used .
|
|
|
Due to its unreliability, this approach has been discarded.
|
|
|
|
|
|
In the end, a bithack trick was used to extend the array of 32-bit unsigned integers into a vector register with SEW width of 64-bits.
|
|
|
The trick is to load the 32-bit array into a register with 64-bit SEW, and then use an `vand` operation to blank the most significant half of the elements, mimicking an extension with zeros.
|
... | ... | |