README.md 29.5 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
Micaela Verucchi's avatar
Micaela Verucchi committed
2
tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier, Nano and several discrete GPUs.
Micaela Verucchi's avatar
Micaela Verucchi committed
3
The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
4

Micaela Verucchi's avatar
Micaela Verucchi committed
5

Micaela Verucchi's avatar
Micaela Verucchi committed
6
If you use tkDNN in your research, please cite the [following paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9212130&casa_token=sQTJXi7tJNoAAAAA:BguH9xCIY48MxbtDS3LXzIXzO-9sWArm7Hd7y7BwaLmqRuM_Gx8bOYizFPNMNtpo5K0kB-P-). For use in commercial solutions, write at gattifrancesco@hotmail.it and micaela.verucchi@unimore.it or refer to https://hipert.unimore.it/ .
Micaela Verucchi's avatar
Micaela Verucchi committed
7
8

```
Micaela Verucchi's avatar
Micaela Verucchi committed
9
10
11
12
13
14
15
16
17
@inproceedings{verucchi2020systematic,
  title={A Systematic Assessment of Embedded Neural Networks for Object Detection},
  author={Verucchi, Micaela and Brilli, Gianluca and Sapienza, Davide and Verasani, Mattia and Arena, Marco and Gatti, Francesco and Capotondi, Alessandro and Cavicchioli, Roberto and Bertogna, Marko and Solieri, Marco},
  booktitle={2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)},
  volume={1},
  pages={937--944},
  year={2020},
  organization={IEEE}
}
Micaela Verucchi's avatar
Micaela Verucchi committed
18
19
```

Micaela Verucchi's avatar
Micaela Verucchi committed
20
## FPS Results
micaela's avatar
micaela committed
21
Inference FPS of yolov4 with tkDNN, average of 1200 images with the same dimension as the input size, on 
Micaela Verucchi's avatar
Micaela Verucchi committed
22
23
  * RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
  * Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );
24
  * Xavier NX, Jetpack 4.4  (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ). 
Micaela Verucchi's avatar
Micaela Verucchi committed
25
26
27
28
29
  * Tx2, Jetpack 4.2 (CUDA 10.0, CUDNN 7.3.1, tensorrt 5.0.6 );
  * Jetson Nano, Jetpack 4.4  (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ). 

| Platform   | Network    | FP32, B=1 | FP32, B=4	| FP16, B=1 |	FP16, B=4 |	INT8, B=1 |	INT8, B=4 | 
| :------:   | :-----:    | :-----:   | :-----:   | :-----:   |	:-----:   |	:-----:   |	:-----:   | 
30
31
32
33
34
35
36
37
| RTX 2080Ti | yolo4 320  | 118.59	  | 237.31	  | 207.81	  | 443.32	  | 262.37	  | 530.93    | 
| RTX 2080Ti | yolo4 416  | 104.81	  | 162.86	  | 169.06	  | 293.78	  | 206.93	  | 353.26    | 
| RTX 2080Ti | yolo4 512  | 92.98	    | 132.43	  | 140.36	  | 215.17	  | 165.35	  | 254.96    | 
| RTX 2080Ti | yolo4 608  | 63.77	    | 81.53	    | 111.39	  | 152.89	  | 127.79	  | 184.72    | 
| AGX Xavier | yolo4 320  |	26.78	    | 32.05	    | 57.14	    | 79.05	    | 73.15	    | 97.56     |
| AGX Xavier | yolo4 416  |	19.96	    | 21.52	    | 41.01	    | 49.00	    | 50.81	    | 60.61     |
| AGX Xavier | yolo4 512  |	16.58	    | 16.98	    | 31.12	    | 33.84	    | 37.82	    | 41.28     |
| AGX Xavier | yolo4 608  |	9.45 	    | 10.13	    | 21.92	    | 23.36	    | 27.05	    | 28.93     |
38
39
40
41
| Xavier NX  | yolo4 320  |	14.56	    | 16.25	    | 30.14	    | 41.15	    | 42.13	    | 53.42     |
| Xavier NX  | yolo4 416  |	10.02	    | 10.60	    | 22.43	    | 25.59	    | 29.08	    | 32.94     |
| Xavier NX  | yolo4 512  |	8.10	    | 8.32	    | 15.78	    | 17.13	    | 20.51	    | 22.46     |
| Xavier NX  | yolo4 608  |	5.26	    | 5.18	    | 11.54	    | 12.06	    | 15.09	    | 15.82     |
42
43
44
45
46
47
48
49
| Tx2        | yolo4 320	| 11.18	    | 12.07	    | 15.32	    | 16.31     | -         | -         |
| Tx2        | yolo4 416	| 7.30	    | 7.58	    | 9.45	    | 9.90      | -         | -         |
| Tx2        | yolo4 512	| 5.96	    | 5.95	    | 7.22	    | 7.23      | -         | -         |
| Tx2        | yolo4 608	| 3.63	    | 3.65	    | 4.67	    | 4.70      | -         | -         |
| Nano       | yolo4 320	| 4.23	    | 4.55	    | 6.14	    | 6.53      | -         | -         |
| Nano       | yolo4 416	| 2.88	    | 3.00	    | 3.90	    | 4.04      | -         | -         |
| Nano       | yolo4 512	| 2.32	    | 2.34	    | 3.02	    | 3.04      | -         | -         |
| Nano       | yolo4 608	| 1.40	    | 1.41	    | 1.92	    | 1.93      | -         | -         |
Micaela Verucchi's avatar
Micaela Verucchi committed
50

Micaela Verucchi's avatar
Micaela Verucchi committed
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## MAP Results
Results for COCO val 2017 (5k images), on RTX 2080Ti, with conf threshold=0.001

|                      | CodaLab       | CodaLab   | CodaLab       | CodaLab     | tkDNN map     | tkDNN map |
| -------------------- | :-----------: | :-------: | :-----------: | :---------: | :-----------: | :-------: |
|                      | **tkDNN**     | **tkDNN** | **darknet**   | **darknet** | **tkDNN**     | **tkDNN** |
|                      | MAP(0.5:0.95) | AP50      | MAP(0.5:0.95) | AP50        | MAP(0.5:0.95) | AP50      |
| Yolov3 (416x416)     | 0.381         | 0.675     | 0.380         | 0.675       | 0.372         | 0.663     |
| yolov4 (416x416)     | 0.468         | 0.705     | 0.471         | 0.710       | 0.459         | 0.695     |
| yolov3tiny (416x416) | 0.096         | 0.202     | 0.096         | 0.201       | 0.093         | 0.198     |
| yolov4tiny (416x416) | 0.202         | 0.400     | 0.201         | 0.400       | 0.197         | 0.395     |
| Cnet-dla34 (512x512) | 0.366         | 0.543     | \-            | \-          | 0.361         | 0.535     |
| mv2SSD (512x512)     | 0.226         | 0.381     | \-            | \-          | 0.223         | 0.378     |

Micaela Verucchi's avatar
Micaela Verucchi committed
65
## Index
Francesco Gatti's avatar
Francesco Gatti committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
- [tkDNN](#tkdnn)
  - [Index](#index)
  - [Dependencies](#dependencies)
  - [About OpenCV](#about-opencv)
  - [How to compile this repo](#how-to-compile-this-repo)
  - [Workflow](#workflow)
  - [How to export weights](#how-to-export-weights)
    - [1)Export weights from darknet](#1export-weights-from-darknet)
    - [2)Export weights for DLA34 and ResNet101](#2export-weights-for-dla34-and-resnet101)
    - [3)Export weights for CenterNet](#3export-weights-for-centernet)
    - [4)Export weights for MobileNetSSD](#4export-weights-for-mobilenetssd)
  - [Run the demo](#run-the-demo)
    - [FP16 inference](#fp16-inference)
    - [INT8 inference](#int8-inference)
  - [mAP demo](#map-demo)
  - [Existing tests and supported networks](#existing-tests-and-supported-networks)
  - [References](#references)
perseusdg's avatar
perseusdg committed
83
  - [tkDNN on Windows 10 (experimental)](#tkdnn-on-windows-10-experimental)
hchandirasekar's avatar
hchandirasekar committed
84
    - [Dependencies-Windows](#dependencies-windows)
perseusdg's avatar
perseusdg committed
85
    - [Compiling tkDNN on Windows](#compiling-tkdnn-on-windows)
hchandirasekar's avatar
hchandirasekar committed
86
    - [Run the demo on Windows](#run-the-demo-on-windows)
perseusdg's avatar
perseusdg committed
87
88
89
      - [FP16 inference windows](#fp16-inference-windows)
      - [INT8 inference windows](#int8-inference-windows)
    - [Known issues with tkDNN on Windows](#known-issues-with-tkdnn-on-windows)
hchandirasekar's avatar
hchandirasekar committed
90
  
Micaela Verucchi's avatar
Micaela Verucchi committed
91
92
93
94
95
96




## Dependencies
This branch works on every NVIDIA GPU that supports the dependencies:
97
98
99
* CUDA 10.0
* CUDNN 7.603
* TENSORRT 6.01
100
* OPENCV 3.4
101
* yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)
Francesco Gatti's avatar
README  
Francesco Gatti committed
102

103
104
105
## About OpenCV
To compile and install OpenCV4 with contrib us the script ```install_OpenCV4.sh```. It will download and compile OpenCV in Download folder.
```
Micaela Verucchi's avatar
Micaela Verucchi committed
106
bash scripts/install_OpenCV4.sh
107
108
109
```
When using openCV not compiled with contrib, comment the definition of OPENCV_CUDACONTRIBCONTRIB in include/tkDNN/DetectionNN.h. When commented, the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency. 

Micaela Verucchi's avatar
Micaela Verucchi committed
110
## How to compile this repo
Francesco Gatti's avatar
Francesco Gatti committed
111
Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (3.15 or above). 
Francesco Gatti's avatar
README  
Francesco Gatti committed
112
```
Micaela Verucchi's avatar
Micaela Verucchi committed
113
git clone https://github.com/ceccocats/tkDNN
114
cd tkDNN
Francesco Gatti's avatar
README  
Francesco Gatti committed
115
116
mkdir build
cd build
Francesco Gatti's avatar
Francesco Gatti committed
117
cmake .. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
118
119
120
make
```

Micaela Verucchi's avatar
Micaela Verucchi committed
121
122
## Workflow
Steps needed to do inference on tkDNN with a custom neural network. 
Micaela Verucchi's avatar
Micaela Verucchi committed
123
* Build and train a NN model with your favorite framework.
Micaela Verucchi's avatar
Micaela Verucchi committed
124
125
126
127
* Export weights and bias for each layer and save them in a binary file (one for layer).
* Export outputs for each layer and save them in a binary file (one for layer).
* Create a new test and define the network, layer by layer using the weights extracted and the output to check the results. 
* Do inference.
Davide Sapienza's avatar
Davide Sapienza committed
128

Micaela Verucchi's avatar
Micaela Verucchi committed
129
130
## How to export weights

131
Weights are essential for any network to run inference. For each test a folder organized as follow is needed (in the build folder):
Davide Sapienza's avatar
Davide Sapienza committed
132
```
Micaela Verucchi's avatar
Micaela Verucchi committed
133
134
135
    test_nn
        |---- layers/ (folder containing a binary file for each layer with the corresponding wieghts and bias)
        |---- debug/  (folder containing a binary file for each layer with the corresponding outputs)
Davide Sapienza's avatar
Davide Sapienza committed
136
```
Micaela Verucchi's avatar
Micaela Verucchi committed
137
Therefore, once the weights have been exported, the folders layers and debug should be placed in the corresponding test.
Davide Sapienza's avatar
Davide Sapienza committed
138

Micaela Verucchi's avatar
Micaela Verucchi committed
139
### 1)Export weights from darknet
Francesco Gatti's avatar
Francesco Gatti committed
140
To export weights for NNs that are defined in darknet framework, use [this](https://git.hipert.unimore.it/fgatti/darknet.git) fork of darknet and follow these steps to obtain a correct debug and layers folder, ready for tkDNN.
Davide Sapienza's avatar
Davide Sapienza committed
141
142

```
Francesco Gatti's avatar
Francesco Gatti committed
143
git clone https://git.hipert.unimore.it/fgatti/darknet.git
144
cd darknet
Micaela Verucchi's avatar
Micaela Verucchi committed
145
146
147
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers
Davide Sapienza's avatar
Davide Sapienza committed
148
```
Micaela Verucchi's avatar
Micaela Verucchi committed
149
N.b. Use compilation with CPU (leave GPU=0 in Makefile) if you also want debug. 
Davide Sapienza's avatar
Davide Sapienza committed
150

Micaela Verucchi's avatar
Micaela Verucchi committed
151
152
### 2)Export weights for DLA34 and ResNet101 
To get weights and outputs needed to run the tests dla34 and resnet101 use the Python script and the Anaconda environment included in the repository.   
Davide Sapienza's avatar
Davide Sapienza committed
153

Micaela Verucchi's avatar
Micaela Verucchi committed
154
Create Anaconda environment and activate it:
Francesco Gatti's avatar
Francesco Gatti committed
155
```
Micaela Verucchi's avatar
Micaela Verucchi committed
156
157
158
conda env create -f file_name.yml
source activate env_name
python <script name>
Francesco Gatti's avatar
Francesco Gatti committed
159
```
Micaela Verucchi's avatar
Micaela Verucchi committed
160
161
### 3)Export weights for CenterNet
To get the weights needed to run Centernet tests use [this](https://github.com/sapienzadavide/CenterNet.git) fork of the original Centernet. 
Francesco Gatti's avatar
Francesco Gatti committed
162
```
Micaela Verucchi's avatar
Micaela Verucchi committed
163
git clone https://github.com/sapienzadavide/CenterNet.git
Francesco Gatti's avatar
Francesco Gatti committed
164
```
Micaela Verucchi's avatar
Micaela Verucchi committed
165
* follow the instruction in the README.md and INSTALL.md
Davide Sapienza's avatar
Davide Sapienza committed
166
167

```
Micaela Verucchi's avatar
Micaela Verucchi committed
168
169
python demo.py --input_res 512 --arch resdcn_101 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_resdcn101.pth --exp_wo --exp_wo_dim 512
python demo.py --input_res 512 --arch dla_34 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_dla_2x.pth --exp_wo --exp_wo_dim 512
Davide Sapienza's avatar
Davide Sapienza committed
170
```
Micaela Verucchi's avatar
Micaela Verucchi committed
171
### 4)Export weights for MobileNetSSD
Micaela Verucchi's avatar
Micaela Verucchi committed
172
To get the weights needed to run Mobilenet tests use [this](https://github.com/mive93/pytorch-ssd) fork of a Pytorch implementation of SSD network. 
Davide Sapienza's avatar
Davide Sapienza committed
173
174

```
Micaela Verucchi's avatar
Micaela Verucchi committed
175
176
177
178
git clone https://github.com/mive93/pytorch-ssd
cd pytorch-ssd
conda env create -f env_mobv2ssd.yml
python run_ssd_live_demo.py mb2-ssd-lite <pth-model-fil> <labels-file>
Davide Sapienza's avatar
Davide Sapienza committed
179
```
Davide Sapienza's avatar
Davide Sapienza committed
180
181
182
183
184
185
186
187
188
189
### 5)Export weights for CenterTrack
To get the weights needed to run CenterTrack tests use [this](https://github.com/sapienzadavide/CenterTrack.git) fork of the original CenterTrack. 
```
git clone https://github.com/sapienzadavide/CenterTrack.git
```
* follow the instruction in the README.md and INSTALL.md

```
python demo.py tracking,ddd --load_model ../models/nuScenes_3Dtracking.pth --dataset nuscenes --pre_hm --track_thresh 0.1 --demo /path/to/image/or/folder/or/video/or/webcam --test_focal_length 633 --exp_wo --exp_wo_dim 512 --input_h 512 --input_w 512
```
Davide Sapienza's avatar
Davide Sapienza committed
190

Francesco Gatti's avatar
Francesco Gatti committed
191
192
193
194
195
196
197
## Darknet Parser
tkDNN implement and easy parser for darknet cfg files, a network can be converted with *tk::dnn::darknetParser*:
```
// example of parsing yolo4
tk::dnn::Network *net = tk::dnn::darknetParser("yolov4.cfg", "yolov4/layers", "coco.names");
net->print();
```
micaela's avatar
micaela committed
198
All models from darknet are now parsed directly from cfg, you still need to export the weights with the described tools in the previous section.
Francesco Gatti's avatar
Francesco Gatti committed
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
<details>
  <summary>Supported layers</summary>
  convolutional
  maxpool
  avgpool
  shortcut
  upsample
  route
  reorg
  region
  yolo
</details>
<details>
  <summary>Supported activations</summary>
  relu
  leaky
  mish
Ricky Medrano's avatar
Ricky Medrano committed
216
  logistic
Francesco Gatti's avatar
Francesco Gatti committed
217
218
</details>

Micaela Verucchi's avatar
Micaela Verucchi committed
219
220
## Run the demo 
This is an example using yolov4.
Davide Sapienza's avatar
Davide Sapienza committed
221

Micaela Verucchi's avatar
Micaela Verucchi committed
222
To run the an object detection first create the .rt file by running:
Davide Sapienza's avatar
Davide Sapienza committed
223
```
Micaela Verucchi's avatar
Micaela Verucchi committed
224
225
226
227
228
229
230
rm yolo4_fp32.rt        # be sure to delete(or move) old tensorRT files
./test_yolo4            # run the yolo test (is slow)
```
If you get problems in the creation, try to check the error activating the debug of TensorRT in this way:
```
cmake .. -DDEBUG=True
make
Davide Sapienza's avatar
Davide Sapienza committed
231
```
Micaela Verucchi's avatar
Micaela Verucchi committed
232

micaela's avatar
micaela committed
233
Once you have successfully created your rt file, run the demo: 
Davide Sapienza's avatar
Davide Sapienza committed
234
```
Micaela Verucchi's avatar
Micaela Verucchi committed
235
./demo yolo4_fp32.rt ../demo/yolo_test.mp4 y
Davide Sapienza's avatar
Davide Sapienza committed
236
```
237
In general the demo program takes 7 parameters:
Davide Sapienza's avatar
Davide Sapienza committed
238
```
Ricky Medrano's avatar
Ricky Medrano committed
239
./demo <network-rt-file> <path-to-video> <kind-of-network> <number-of-classes> <n-batches> <show-flag> <conf-thresh>
240
```
Micaela Verucchi's avatar
Micaela Verucchi committed
241
242
243
244
where
*  ```<network-rt-file>``` is the rt file generated by a test
*  ```<<path-to-video>``` is the path to a video file or a camera input  
*  ```<kind-of-network>``` is the type of network. Thee types are currently supported: ```y``` (YOLO family), ```c``` (CenterNet family) and ```m``` (MobileNet-SSD family)
245
*  ```<number-of-classes>```is the number of classes the network is trained on
246
247
*  ```<n-batches>``` number of batches to use in inference (N.B. you should first export TKDNN_BATCHSIZE to the required n_batches and create again the rt file for the network).
*  ```<show-flag>``` if set to 0 the demo will not show the visualization but save the video into result.mp4 (if n-batches ==1)
248
*  ```<conf-thresh>``` confidence threshold for the detector. Only bounding boxes with threshold greater than conf-thresh will be displayed.
249

Davide Sapienza's avatar
Davide Sapienza committed
250
N.b. By default it is used FP32 inference
Micaela Verucchi's avatar
Micaela Verucchi committed
251

Micaela Verucchi's avatar
Micaela Verucchi committed
252

Micaela Verucchi's avatar
Micaela Verucchi committed
253
![demo](https://user-images.githubusercontent.com/11562617/72547657-540e7800-388d-11ea-83c6-49dfea2a0607.gif)
254

Davide Sapienza's avatar
Davide Sapienza committed
255
256
257
258
259
260
261
262
263
264
265
266
267
### Run the 3D demo

To run the 3D object detection demo follow these steps (example with CenterNet based on DLA34):
```
rm dla34_cnet3d_fp32.rt        # be sure to delete(or move) old tensorRT files
./test_dla34_cnet3d            # run the yolo test (is slow)
./demo3D dla34_cnet3d_fp32.rt ../demo/yolo_test.mp4 c
```
The demo3D program takes the same parameters of the demo program:
```
./demo <network-rt-file> <path-to-video> <kind-of-network> <number-of-classes>
```

Davide Sapienza's avatar
Davide Sapienza committed
268
269
270
271
272
273
274
275
276
#### Run the 3D OD-tracking demo

To run the 3D object detection & tracking demo follow these steps (example with CenterTrack based on DLA34):
```
rm dla34_cnet3d_track_fp32.rt  # be sure to delete(or move) old tensorRT files
./test_dla34_cnet3d_track      # run the yolo test (is slow)
./demo3D dla34_cnet3d_track_fp32.rt ../demo/yolo_test.mp4 t
```

Davide Sapienza's avatar
Davide Sapienza committed
277
278
279
280
281
### FP16 inference

To run the an object detection demo with FP16 inference follow these steps (example with yolov3):
```
export TKDNN_MODE=FP16  # set the half floating point optimization
Francesco Gatti's avatar
Francesco Gatti committed
282
rm yolo3_fp16.rt        # be sure to delete(or move) old tensorRT files
Davide Sapienza's avatar
Davide Sapienza committed
283
./test_yolo3            # run the yolo test (is slow)
Francesco Gatti's avatar
Francesco Gatti committed
284
./demo yolo3_fp16.rt ../demo/yolo_test.mp4 y
Davide Sapienza's avatar
Davide Sapienza committed
285
286
287
288
289
```
N.b. Using FP16 inference will lead to some errors in the results (first or second decimal). 

### INT8 inference

Micaela Verucchi's avatar
Micaela Verucchi committed
290
291
292
293
294
295
To run the an object detection demo with INT8 inference three environment variables need to be set:
  * ```export TKDNN_MODE=INT8```: set the 8-bit integer optimization
  * ```export TKDNN_CALIB_IMG_PATH=/path/to/calibration/image_list.txt``` : image_list.txt has in each line the absolute path to a calibration image
  * ```export TKDNN_CALIB_LABEL_PATH=/path/to/calibration/label_list.txt```: label_list.txt has in each line the absolute path to a calibration label
  
You should provide image_list.txt and label_list.txt, using training images. However, if you want to quickly test the INT8 inference you can run (from this repo root folder)
Davide Sapienza's avatar
Davide Sapienza committed
296
```
Micaela Verucchi's avatar
Micaela Verucchi committed
297
298
bash scripts/download_validation.sh COCO
```
micaela's avatar
micaela committed
299
to automatically download COCO2017 validation (inside demo folder) and create those needed file. Use BDD instead of COCO to download BDD validation. 
Davide Sapienza's avatar
Davide Sapienza committed
300

Micaela Verucchi's avatar
Micaela Verucchi committed
301
302
303
304
305
Then a complete example using yolo3 and COCO dataset would be:
```
export TKDNN_MODE=INT8
export TKDNN_CALIB_LABEL_PATH=../demo/COCO_val2017/all_labels.txt
export TKDNN_CALIB_IMG_PATH=../demo/COCO_val2017/all_images.txt
Francesco Gatti's avatar
Francesco Gatti committed
306
rm yolo3_int8.rt        # be sure to delete(or move) old tensorRT files
Davide Sapienza's avatar
Davide Sapienza committed
307
./test_yolo3            # run the yolo test (is slow)
Francesco Gatti's avatar
Francesco Gatti committed
308
./demo yolo3_int8.rt ../demo/yolo_test.mp4 y
Davide Sapienza's avatar
Davide Sapienza committed
309
```
Micaela Verucchi's avatar
Micaela Verucchi committed
310
311
312
313
314
N.B. 
 * Using INT8 inference will lead to some errors in the results. 
 * The test will be slower: this is due to the INT8 calibration, which may take some time to complete. 
 * INT8 calibration requires TensorRT version greater than or equal to 6.0
 * Only 100 images are used to create the calibration table by default (set in the code).
Davide Sapienza's avatar
Davide Sapienza committed
315

316
317
318
### BatchSize bigger than 1
```
export TKDNN_BATCHSIZE=2
Francesco Gatti's avatar
Francesco Gatti committed
319
320
# build tensorRT files
```
micaela's avatar
micaela committed
321
322
This will create a TensorRT file with the desired **max** batch size.
The test will still run with a batch of 1, but the created tensorRT can manage the desired batch size.
Francesco Gatti's avatar
Francesco Gatti committed
323
324

### Test batch Inference
Francesco Gatti's avatar
Francesco Gatti committed
325
This will test the network with random input and check if the output of each batch is the same.
Francesco Gatti's avatar
Francesco Gatti committed
326
327
328
329
330
331
332
333
334
```
./test_rtinference <network-rt-file> <number-of-batches>
# <number-of-batches> should be less or equal to the max batch size of the <network-rt-file>

# example
export TKDNN_BATCHSIZE=4           # set max batch size
rm yolo3_fp32.rt                   # be sure to delete(or move) old tensorRT files
./test_yolo3                       # build RT file
./test_rtinference yolo3_fp32.rt 4 # test with a batch size of 4
335
336
```

337
## mAP demo
Davide Sapienza's avatar
Davide Sapienza committed
338

339
340
To compute mAP, precision, recall and f1score, run the map_demo.

341
342
A validation set is needed. 
To download COCO_val2017 (80 classes) run (form the root folder): 
xavier's avatar
xavier committed
343
```
344
bash scripts/download_validation.sh COCO
xavier's avatar
xavier committed
345
```
346
347
348
349
350
To download Berkeley_val (10 classes) run (form the root folder): 
```
bash scripts/download_validation.sh BDD
```

xavier's avatar
xavier committed
351
To compute the map, the following parameters are needed:
352
```
Micaela Verucchi's avatar
Micaela Verucchi committed
353
./map_demo <network rt> <network type [y|c|m]> <labels file path> <config file path>
354
355
```
where 
Micaela Verucchi's avatar
Micaela Verucchi committed
356
* ```<network rt>```: rt file of a chosen network on which compute the mAP.
Micaela Verucchi's avatar
Micaela Verucchi committed
357
* ```<network type [y|c|m]>```: type of network. Right now only y(yolo), c(centernet) and m(mobilenet) are allowed
Micaela Verucchi's avatar
Micaela Verucchi committed
358
* ```<labels file path>```: path to a text file containing all the paths of the ground-truth labels. It is important that all the labels of the ground-truth are in a folder called 'labels'. In the folder containing the folder 'labels' there should be also a folder 'images', containing all the ground-truth images having the same same as the labels. To better understand, if there is a label path/to/labels/000001.txt there should be a corresponding image path/to/images/000001.jpg. 
Micaela Verucchi's avatar
Micaela Verucchi committed
359
* ```<config file path>```: path to a yaml file with the parameters needed for the mAP computation, similar to demo/config.yaml
360
361
362
363

Example:

```
xavier's avatar
xavier committed
364
cd build
Davide Sapienza's avatar
Davide Sapienza committed
365
./map_demo dla34_cnet_FP32.rt c ../demo/COCO_val2017/all_labels.txt ../demo/config.yaml
xavier's avatar
xavier committed
366
```
Micaela Verucchi's avatar
Micaela Verucchi committed
367

micaela's avatar
micaela committed
368
This demo also creates a json file named ```net_name_COCO_res.json``` containing all the detections computed. The detections are in COCO format, the correct format to submit the results to [CodaLab COCO detection challenge](https://competitions.codalab.org/competitions/20794#participate).
369

Micaela Verucchi's avatar
Micaela Verucchi committed
370
## Existing tests and supported networks
Micaela Verucchi's avatar
Micaela Verucchi committed
371
372
373

| Test Name         | Network                                       | Dataset                                                       | N Classes | Input size    | Weights                                                                   |
| :---------------- | :-------------------------------------------- | :-----------------------------------------------------------: | :-------: | :-----------: | :------------------------------------------------------------------------ |
Micaela Verucchi's avatar
Micaela Verucchi committed
374
| yolo              | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 608x608       | [weights](https://cloud.hipert.unimore.it/s/nf4PJ3k8bxBETwL/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
375
376
377
| yolo_224          | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| yolo_berkeley     | YOLO v2<sup>1</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 416x736       | weights                                                                   |
| yolo_relu         | YOLO v2 (with ReLU, not Leaky)<sup>1</sup>    | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
378
| yolo_tiny         | YOLO v2 tiny<sup>1</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/m3orfJr8pGrN5mQ/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
379
| yolo_voc          | YOLO v2<sup>1</sup>                           | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/DJC5Fi2pEjfNDP9/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
380
| yolo3             | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/jPXmHyptpLoNdNR/download)     |
381
| yolo3_512   | YOLO v3<sup>2</sup>                                 | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/RGecMeGLD4cXEWL/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
382
| yolo3_berkeley    | YOLO v3<sup>2</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 320x544       | [weights](https://cloud.hipert.unimore.it/s/o5cHa4AjTKS64oD/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
383
384
| yolo3_coco4       | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 4         | 416x416       | [weights](https://cloud.hipert.unimore.it/s/o27NDzSAartbyc4/download)                                                                   |
| yolo3_flir        | YOLO v3<sup>2</sup>                           | [FREE FLIR](https://www.flir.com/oem/adas/adas-dataset-form/) | 3         | 320x544       | [weights](https://cloud.hipert.unimore.it/s/62DECncmF6bMMiH/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
385
| yolo3_tiny        | YOLO v3 tiny<sup>2</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/LMcSHtWaLeps8yN/download)     |
386
| yolo3_tiny512     | YOLO v3 tiny<sup>2</sup>                      | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/8Zt6bHwHADqP4JC/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
387
| dla34             | Deep Leayer Aggreagtion (DLA) 34<sup>3</sup>  | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
388
| dla34_cnet        | Centernet (DLA34 backend)<sup>4</sup>         | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/KRZBbCQsKAtQwpZ/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
389
| mobilenetv2ssd    | Mobilnet v2 SSD Lite<sup>5</sup>              | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 300x300       | [weights](https://cloud.hipert.unimore.it/s/x4ZfxBKN23zAJQp/download)     |
390
| mobilenetv2ssd512 | Mobilnet v2 SSD Lite<sup>5</sup>              | [COCO 2017](http://cocodataset.org/)                          | 81        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/pdCw2dYyHMJrcEM/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
391
| resnet101         | Resnet 101<sup>6</sup>                        | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
392
| resnet101_cnet    | Centernet (Resnet101 backend)<sup>4</sup>     | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/5BTjHMWBcJk8g3i/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
393
| csresnext50-panet-spp    | Cross Stage Partial Network <sup>7</sup>     | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/Kcs4xBozwY4wFx8/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
394
| yolo4             | Yolov4 <sup>8</sup>                           | [COCO 2017](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/d97CFzYqCPCp5Hg/download)     |
Francesco Gatti's avatar
Francesco Gatti committed
395
| yolo4_berkeley             | Yolov4 <sup>8</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)                          | 10        | 540x320       | [weights](https://cloud.hipert.unimore.it/s/nkWFa5fgb4NTdnB/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
396
| yolo4tiny             | Yolov4 tiny <sup>9</sup>                           | [COCO 2017](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/iRnc4pSqmx78gJs/download)     |
397
398
| yolo4x             | Yolov4x-mish  <sup>9</sup>                          | [COCO 2017](http://cocodataset.org/)                          | 80        | 640x640       | [weights](https://cloud.hipert.unimore.it/s/5MFjtNtgbDGdJEo/download)     |
| yolo4x-cps            | Scaled Yolov4 <sup>10</sup>                          | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/AfzHE4BfTeEm2gH/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
399

perseusdg's avatar
perseusdg committed
400
### tkDNN on Windows 10 (experimental)
hchandirasekar's avatar
hchandirasekar committed
401

hchandirasekar's avatar
hchandirasekar committed
402
### Dependencies-Windows 
hchandirasekar's avatar
hchandirasekar committed
403
404
405
This branch should work on every NVIDIA GPU supported in windows with the following dependencies:

* WINDOWS 10 1803 or HIGHER 
perseusdg's avatar
perseusdg committed
406
407
408
409
410
411
* CUDA 10.0 (Recommended CUDA 11.2 )
* CUDNN 7.6 (Recommended CUDNN 8.1.1 )
* TENSORRT 6.0.1 (Recommended TENSORRT 7.2.3.4 )
* OPENCV 3.4 (Recommended OPENCV 4.2.0 )
* MSVC 16.7 
* YAML-CPP 
hchandirasekar's avatar
hchandirasekar committed
412
413
414
415
* EIGEN3
* 7ZIP (ADD TO PATH)
* NINJA 1.10

perseusdg's avatar
perseusdg committed
416

hchandirasekar's avatar
hchandirasekar committed
417
418
419
All the above mentioned dependencies except 7ZIP can be installed using Microsoft's [VCPKG](https://github.com/microsoft/vcpkg.git) .
After bootstrapping VCPKG the dependencies can be built and installed using the following command :

perseusdg's avatar
perseusdg committed
420
421
422
423
424
```
opencv4(normal) - vcpkg.exe install opencv4[tbb,jpeg,tiff,opengl,openmp,png,ffmpeg,eigen]:x64-windows yaml-cpp:x64-windows eigen3:x64-windows --x-install-root=C:\opt --x-buildtrees-root=C:\temp_vcpkg_build

opencv4(cuda) - vcpkg.exe install opencv4[cuda,nonfree,contrib,eigen,tbb,jpeg,tiff,opengl,openmp,png,ffmpeg]:x64-windows yaml-cpp:x64-windows eigen3:x64-windows --x-install-root=C:\opt --x-buildtrees-root=C:\temp_vcpkg_build
```
Harshvardhan Chandirasekar's avatar
Harshvardhan Chandirasekar committed
425
To build opencv4 with cuda and cudnn version corresponding to your cuda version,vcpkg's cudnn portfile needs to be modified by adding ```$ENV{CUDA_PATH}```  at lines 16 and 17 in the portfile.cmake 
hchandirasekar's avatar
hchandirasekar committed
426
427
428
429
430
431
432

After VCPKG finishes building and installing all the packages delete C:\temp_vcpkg_build and add C:\opt\x64-windows\bin and C:\opt\x64-windows\debug\bin to path 

### Compiling tkDNN on Windows 

tkDNN is built with cmake(3.15+) on windows along with ninja.Msbuild and NMake Makefiles are drastically slower when compiling the library compared to windows
```
Harshvardhan Chandirasekar's avatar
Harshvardhan Chandirasekar committed
433
git clone https://github.com/ceccocats/tkDNN.git
hchandirasekar's avatar
hchandirasekar committed
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
cd tkdnn-windows
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -G"Ninja" ..
ninja -j4
```

### Run the demo on Windows 

This example uses yolo4_tiny.\
To run the object detection file create .rt file bu running:
```
.\test_yolo4tiny.exe
```

Once the rt file has been successfully create,run the demo using the following command:
```
.\demo.exe yolo4tiny_fp32.rt ..\demo\yolo_test.mp4 y 
```
 For general info on more demo paramters,check Run the demo section on top 
Harshvardhan Chandirasekar's avatar
Harshvardhan Chandirasekar committed
454
 To run the test_all_tests.sh on windows,use git bash or msys2 
hchandirasekar's avatar
hchandirasekar committed
455

perseusdg's avatar
perseusdg committed
456
### FP16 inference windows 
hchandirasekar's avatar
hchandirasekar committed
457
458
459
460
461
462
463
464
465

This is an untested feature on windows.To run the object detection demo with FP16 interference follow the below steps(example with yolo4tiny):
```
set TKDNN_MODE=FP16
del /f yolo4tiny_fp16.rt
.\test_yolo4tiny.exe
.\demo.exe yolo4tiny_fp16.rt ..\demo\yolo_test.mp4
```

perseusdg's avatar
perseusdg committed
466
### INT8 inference windows 
hchandirasekar's avatar
hchandirasekar committed
467
468
469
470
471
472
473
474
475
476
477
To run object detection demo with INT8 (example with yolo4tiny):
```
set TKDNN_MODE=INT8
set TKDNN_CALIB_LABEL_PATH=..\demo\COCO_val2017\all_labels.txt
set TKDNN_CALIB_IMG_PATH=..\demo\COCO_val2017\all_images.txt
del /f  yolo4tiny_int8.rt        # be sure to delete(or move) old tensorRT files
.\test_yolo4tiny.exe           # run the yolo test (is slow)
.\demo.exe yolo4tiny_int8.rt ..\demo\yolo_test.mp4 y

```

perseusdg's avatar
perseusdg committed
478
### Known issues with tkDNN on Windows
hchandirasekar's avatar
hchandirasekar committed
479

perseusdg's avatar
perseusdg committed
480
Mobilenet and Centernet demos work properly only when built with msvc 16.7 in Release Mode,when built in debug mode for the mentioned networks one might encounter opencv assert errors
hchandirasekar's avatar
hchandirasekar committed
481

perseusdg's avatar
perseusdg committed
482
All Darknet models work properly with demo using MSVC version(16.7-16.9)
hchandirasekar's avatar
hchandirasekar committed
483

perseusdg's avatar
perseusdg committed
484
It is recommended to use Nvidia Driver(465+),Cuda unknown errors have been observed when using older drivers on pascal(SM 61) devices.
hchandirasekar's avatar
hchandirasekar committed
485
486


Micaela Verucchi's avatar
Micaela Verucchi committed
487
488
489
490
491
492
493
494
495


## References

1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Micaela Verucchi's avatar
Micaela Verucchi committed
496
6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Micaela Verucchi's avatar
Micaela Verucchi committed
497
7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).
Micaela Verucchi's avatar
Micaela Verucchi committed
498
8. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv preprint arXiv:2004.10934 (2020).
Micaela Verucchi's avatar
Micaela Verucchi committed
499
9. Bochkovskiy, Alexey, "Yolo v4, v3 and v2 for Windows and Linux" (https://github.com/AlexeyAB/darknet)
500
10. Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "Scaled-YOLOv4: Scaling Cross Stage Partial Network." arXiv preprint arXiv:2011.08036 (2020).