djurado · 976bbde8
--- a/32-bit-to-64-bit.md
+++ b/32-bit-to-64-bit.md
+In section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types) we mentioned the necessity of having to extend elements of a vector register from 32-bit to 64-bit. 
+In this annex we show three possible solutions (**testodd3**, **testodd5** and **testodd6**).
+We provide a test program that accepts an argument for the number of elements in the array, and then prints the 32-bit number and the extended 64-bit number.
+All solutions rely on the mechanism previously described in section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types), which consists on bitwise and and shift operations; however, the solutions use different methods for handling an array with an uneven number of elements.
+**testodd3** and **testodd5** do a 64-bit load to an array of 32-bit elements.
+As we mentioned in section [32-bit and 64-bit data types](32-bit-and-64-bit-data-types), this can be the source of an unaligned access exception in the FPGA.
+This exception has only triggered with LAMMPS, but not with the test program.
+A quick fix for this issue was to do a 32-bit load of the 32-bit array and the use a `vmv.v.v` instruction to move the result to a 64-bit register, and the apply the bithack method.
+This fix was added as an aftertought.
+In contrast, **testodd6** has been designed from the ground up thinking about the risk of unaligned access.
+For this reason, it already includes the `vmv.v.v`, which allows handling an array with an odd number of elements in a simpler way.
+Next, we discuss how each version handles arrays with odd numebers of elements:
+* In **testodd3**, an array with odd number of elements is problematic due that we do a 64-bit load to an array of 32-bit elements. Two 32-bit elements fit inside a 64-bit element in the register. But with an odd number elements, accessing the last element could involve also accessing the next inexistent element, generating an out of bounds access, with could be the source of a segmentation fault exception. **testodd3** solves this in the simplest way possible:
+    * If the array has an odd number of elements...
+    * The last element is processed in serial
+    * The rest of the array (even number of elements) is processed vectorially.
+* **testodd5** tries to improve the previous solution by avoiding serial execution:
+    * If the array has an odd number of elements...
+    * The last element is processed in left out
+    * When processing the last vector register, is is enlarged by one element with `vsetvl`.processed in serial.
+    * The last element is inserted in this new space using `vmerge`
+    * The register is processed vectorially
+* **testodd6** Does a 32-bit load instead of a 64-bit load. For this reason, the load cannot generate an out of bounds access. But some extra handling is required:
+    * If the array has an odd number of elements...
+    * When doing `vmv.v.v` to move the last register from a 32-bit to a 64-bit register...
+    * If odd number of elements, then the 64-bit destination register should have size `nelem/2 + 1` to account for the last element (and the inexistent element)
+    * The `vand` to extract even elements can be done normally, but the `vsrl` shift to get odd elements requires using `vsetvl` to decrease the length of the register register to get  by one element in order to delete the inexistent element.
+TODO: add files? how? repo? attachment?
+TODO: more descriptive names
\ No newline at end of file