Output mismatch with golden answer for use-case NVDLA on tests sanity3 and conv_8x8_fc_int16
I'm trying to replicate the experiments on NVDLA use cases in the paper. I basically followed the instructions in README.md (some actions are not strictly followed, but will be notified below), and passed sanity0, sanity1 and sanity2 tests provided by NVDLA. However, it throws an output mismatch error on other test cases that requires a mem_dump. The dumped file shows all bytes are 0s (0x00).
Here is the main procedure that I followed (including those different from the instructions in README):
I used nv_full NVDLA model and the verilated nv_full model passed sanity0, sanity1, sanity2, sanity3, conv_8x8_fc_int16 tests in NVDLA (and perhaps some others that I have not covered);
Since it takes too long for verilator to compile with '--trace' option and I don't think I need to enable waveform during simulation, I verilated nv_full without '--trace' option and commented this line (
dla->trace(tfp, 99);at ext/rtl/model_nvdla/wrapper_nvdla.cc#L72);
I did not modify ext/rtl/SConscript since it seems to have been done for NVDLA;
To create a checkpoint for full system simulation and restore from it, I strictly followed the command provided in README.md (except that I skipped the --enable-waveform option). It confused me a little bit that there are another two commands in Makefile (at the root of the gem5-rtl directory), which majorly differs from the former ones in the '--dtb' option. Gem5 doc says the dtb file will be auto-generated when dtb file is not explicitly specified, but I did not see any particular information about NVDLA accelerators in the auto-generated dtb file after converting it to human-readable formats. So should the dtb associated with the target configuration be manually-modified, or using the auto-generated one is just fine?
To create a binary runnable (validation_nvdla) on the simulated ARM64 platform, I used aarch64-linux-gnu-g++-7 for cross compilation. I used a pure gem5-api implementation for it (using
m5_start_accel((uint64_t)region_nvdla, size, (uint64_t)region_nvdla)and
m5_wait_accel((uint64_t)region_nvdla, size)APIs), instead of the PARSEC library in the provided implementation in bsc-util/validation_nvdla.c, so I linked it with m5ops.o, which was also cross-compiled by myself with util/m5/src/abi/arm64/m5op.S;
The versions of tools that I use: verilator 4.220, clang 14.0.0, linux kernel and disk image for FS simulation: http://dist.gem5.org/dist/v21-2/arm/aarch-system-20210904.tar.bz2, http://dist.gem5.org/dist/v21-2/arm/disks/ubuntu-18.04-arm64-docker.img.bz2
The m5out output directory for sanity3 is attached. The system output is logged in system_running_log file.sanity3_m5out.zip
Any help will be greatly appreciated!!