... | ... | @@ -2,16 +2,16 @@ In section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types) we menti |
|
|
In this annex we show three possible solutions (**testodd3**, **testodd5** and **testodd6**).
|
|
|
We provide a test program that accepts an argument for the number of elements in the array, and then prints the 32-bit number and the extended 64-bit number.
|
|
|
|
|
|
All solutions rely on the mechanism previously described in section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types), which consists on bitwise and and shift operations; however, the solutions use different methods for handling an array with an uneven number of elements.
|
|
|
All solutions rely on the mechanism previously described in section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types), which consists on bitwise and `vand` and `vsrl` operations; however, the solutions use different methods for handling an array with an odd number of elements.
|
|
|
|
|
|
**testodd3** and **testodd5** do a 64-bit load to an array of 32-bit elements.
|
|
|
Both **testodd3** and **testodd5** do a 64-bit load to an array of 32-bit elements.
|
|
|
As we mentioned in section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types), this can be the source of an unaligned access exception in the FPGA.
|
|
|
This exception has only triggered with LAMMPS, but not with the test program.
|
|
|
A quick fix for this issue was to do a 32-bit load of the 32-bit array and the use a `vmv.v.v` instruction to move the result to a 64-bit register, and the apply the bithack method.
|
|
|
This fix was added as an aftertought.
|
|
|
|
|
|
In contrast, **testodd6** has been designed from the ground up thinking about the risk of unaligned access.
|
|
|
For this reason, it already includes the `vmv.v.v`, which allows handling an array with an odd number of elements in a simpler way.
|
|
|
For this reason, it already includes the `vmv.v.v`, which allows dealing with an array with an odd number of elements in a simpler way.
|
|
|
|
|
|
Next, we discuss how each version handles arrays with odd numebers of elements:
|
|
|
|
... | ... | @@ -22,14 +22,14 @@ Next, we discuss how each version handles arrays with odd numebers of elements: |
|
|
* **testodd5** tries to improve the previous solution by avoiding serial execution:
|
|
|
* If the array has an odd number of elements...
|
|
|
* The last element is processed in left out
|
|
|
* When processing the last vector register, is is enlarged by one element with `vsetvl`.processed in serial.
|
|
|
* When processing the last vector register, is is enlarged by one element with `vsetvl`.
|
|
|
* The last element is inserted in this new space using `vmerge`
|
|
|
* The register is processed vectorially
|
|
|
* **testodd6** Does a 32-bit load instead of a 64-bit load. For this reason, the load cannot generate an out of bounds access. But some extra handling is required:
|
|
|
* **testodd6** Does a 32-bit load instead of a 64-bit load. For this reason, the load cannot generate an out of bounds access. This also allows getting rid of the `vmerge`, but some extra handling is required:
|
|
|
* If the array has an odd number of elements...
|
|
|
* When doing `vmv.v.v` to move the last register from a 32-bit to a 64-bit register...
|
|
|
* If odd number of elements, then the 64-bit destination register should have size `nelem/2 + 1` to account for the last element (and the inexistent element)
|
|
|
* The `vand` to extract even elements can be done normally, but the `vsrl` shift to get odd elements requires using `vsetvl` to decrease the length of the register register to get by one element in order to delete the inexistent element.
|
|
|
* The `vand` to extract even elements can be done normally, but the `vsrl` shift to get odd elements requires using `vsetvl` to decrease the length of the register to get by one element in order to delete the inexistent element.
|
|
|
|
|
|
TODO: add files? how? repo? attachment?
|
|
|
TODO: more descriptive names |
|
|
\ No newline at end of file |