... | @@ -106,18 +106,20 @@ The proposed solution was to try to use `vwadd` with inline assembler, but that |
... | @@ -106,18 +106,20 @@ The proposed solution was to try to use `vwadd` with inline assembler, but that |
|
For the implementation of this instruction in the VPU, output is written to two consecutive registers, altough only one is specified as an operand.
|
|
For the implementation of this instruction in the VPU, output is written to two consecutive registers, altough only one is specified as an operand.
|
|
This can lead to compilation errors, since the inline assembler is not aware of this and may automatically choose a combination of input and output registers that overlap, and generate and error when assembling the instructions.
|
|
This can lead to compilation errors, since the inline assembler is not aware of this and may automatically choose a combination of input and output registers that overlap, and generate and error when assembling the instructions.
|
|
Sometimes the compiler produces this error, but changing the optimization setting (from -O0 to -O2) can fix the issue since a different combination of registers may be used which happen to not overlap.
|
|
Sometimes the compiler produces this error, but changing the optimization setting (from -O0 to -O2) can fix the issue since a different combination of registers may be used which happen to not overlap.
|
|
For this reason, this approach has not been used.
|
|
For this reason, this approach has been discarded.
|
|
|
|
|
|
In the end, a bithack trick was used to extend the array of 32-bit unsigned integers into a vector register with SEW width of 64-bits.
|
|
In the end, a bithack trick was used to extend the array of 32-bit unsigned integers into a vector register with SEW width of 64-bits.
|
|
The trick is to load the 32-bit array into a register with 64-bit SEW, and then use an `vand` operation to blank the most significant half of the elements, mimicking an extension with zeros.
|
|
The trick is to load the 32-bit array into a register with 64-bit SEW, and then use an `vand` operation to blank the most significant half of the elements, mimicking an extension with zeros.
|
|
To get the other half, it is needed to perform a shift right logic before applying the `vand`.
|
|
To get the other half, it is needed to perform a shift right logic before applying the `vand`.
|
|
This method is very low level and depends on the endianness of the system in order to work (TODO elaborate why).
|
|
In addition, this method also requires a bit of extra handling for the case in which the array has and odd number of elements.
|
|
Moreover, it also requires a bit of extra handling for the case in which the array has and odd number of elements.
|
|
The following figure shows a representation of the operations needed for the 32-bit to 64-bit conversion.
|
|
To see the code in detail, check annex (TODO).
|
|
To see the code in detail, check annex (TODO).
|
|
|
|
|
|
|
|
![evenodd](uploads/82a21f4b6d4321ea2481d04fa9818f9e/evenodd.png)
|
|
|
|
|
|
It is important to check if a unaligned memory access exception can be produced with vector loads.
|
|
It is important to check if a unaligned memory access exception can be produced with vector loads.
|
|
For instance, to place a 32-bit array inside a 64-bit register, performing a load with a SEW of 64 eventually produces an unaligned access exception.
|
|
For instance, to place a 32-bit array inside a 64-bit register, performing a load with a SEW of 64 produces an unaligned access exception if the starting address is not aligned.
|
|
For this reason, it is needed to perform the memory load with a SEW of 32, and the move the contents to a 64-bit register using `vmv.v.v` (as with TODO union_int_float_t in section X).
|
|
For this reason, it is needed to perform the memory load with a SEW of 32, and then move the contents to a 64-bit register using `vmv.v.v` (as with TODO union_int_float_t in section X).
|
|
|
|
|
|
### Handling union_int_float_t
|
|
### Handling union_int_float_t
|
|
|
|
|
... | | ... | |