README.md 15.7 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
Micaela Verucchi's avatar
Micaela Verucchi committed
2
tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier, Nano and several discrete GPUs.
Micaela Verucchi's avatar
Micaela Verucchi committed
3
The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
4

Micaela Verucchi's avatar
Micaela Verucchi committed
5

Micaela Verucchi's avatar
Micaela Verucchi committed
6
If you use tkDNN in your research, please cite the [following paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9212130&casa_token=sQTJXi7tJNoAAAAA:BguH9xCIY48MxbtDS3LXzIXzO-9sWArm7Hd7y7BwaLmqRuM_Gx8bOYizFPNMNtpo5K0kB-P-). For use in commercial solutions, write at gattifrancesco@hotmail.it and micaela.verucchi@unimore.it or refer to https://hipert.unimore.it/ .
Micaela Verucchi's avatar
Micaela Verucchi committed
7
8

```
Micaela Verucchi's avatar
Micaela Verucchi committed
9
10
11
12
13
14
15
16
17
@inproceedings{verucchi2020systematic,
  title={A Systematic Assessment of Embedded Neural Networks for Object Detection},
  author={Verucchi, Micaela and Brilli, Gianluca and Sapienza, Davide and Verasani, Mattia and Arena, Marco and Gatti, Francesco and Capotondi, Alessandro and Cavicchioli, Roberto and Bertogna, Marko and Solieri, Marco},
  booktitle={2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)},
  volume={1},
  pages={937--944},
  year={2020},
  organization={IEEE}
}
Micaela Verucchi's avatar
Micaela Verucchi committed
18
19
```

20
### What's new (20 July 2021)
Micaela Verucchi's avatar
Micaela Verucchi committed
21
- [x] Support to sematic segmentation [README](docs/README_seg.md)
22
- [ ] Support to TensorRT8 (WIP)
23

Micaela Verucchi's avatar
Micaela Verucchi committed
24
## FPS Results
micaela's avatar
micaela committed
25
Inference FPS of yolov4 with tkDNN, average of 1200 images with the same dimension as the input size, on 
Micaela Verucchi's avatar
Micaela Verucchi committed
26
27
  * RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
  * Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );
28
  * Xavier NX, Jetpack 4.4  (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ). 
Micaela Verucchi's avatar
Micaela Verucchi committed
29
30
31
32
33
  * Tx2, Jetpack 4.2 (CUDA 10.0, CUDNN 7.3.1, tensorrt 5.0.6 );
  * Jetson Nano, Jetpack 4.4  (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ). 

| Platform   | Network    | FP32, B=1 | FP32, B=4	| FP16, B=1 |	FP16, B=4 |	INT8, B=1 |	INT8, B=4 | 
| :------:   | :-----:    | :-----:   | :-----:   | :-----:   |	:-----:   |	:-----:   |	:-----:   | 
34
35
36
37
38
39
40
41
| RTX 2080Ti | yolo4 320  | 118.59	  | 237.31	  | 207.81	  | 443.32	  | 262.37	  | 530.93    | 
| RTX 2080Ti | yolo4 416  | 104.81	  | 162.86	  | 169.06	  | 293.78	  | 206.93	  | 353.26    | 
| RTX 2080Ti | yolo4 512  | 92.98	    | 132.43	  | 140.36	  | 215.17	  | 165.35	  | 254.96    | 
| RTX 2080Ti | yolo4 608  | 63.77	    | 81.53	    | 111.39	  | 152.89	  | 127.79	  | 184.72    | 
| AGX Xavier | yolo4 320  |	26.78	    | 32.05	    | 57.14	    | 79.05	    | 73.15	    | 97.56     |
| AGX Xavier | yolo4 416  |	19.96	    | 21.52	    | 41.01	    | 49.00	    | 50.81	    | 60.61     |
| AGX Xavier | yolo4 512  |	16.58	    | 16.98	    | 31.12	    | 33.84	    | 37.82	    | 41.28     |
| AGX Xavier | yolo4 608  |	9.45 	    | 10.13	    | 21.92	    | 23.36	    | 27.05	    | 28.93     |
42
43
44
45
| Xavier NX  | yolo4 320  |	14.56	    | 16.25	    | 30.14	    | 41.15	    | 42.13	    | 53.42     |
| Xavier NX  | yolo4 416  |	10.02	    | 10.60	    | 22.43	    | 25.59	    | 29.08	    | 32.94     |
| Xavier NX  | yolo4 512  |	8.10	    | 8.32	    | 15.78	    | 17.13	    | 20.51	    | 22.46     |
| Xavier NX  | yolo4 608  |	5.26	    | 5.18	    | 11.54	    | 12.06	    | 15.09	    | 15.82     |
46
47
48
49
50
51
52
53
| Tx2        | yolo4 320	| 11.18	    | 12.07	    | 15.32	    | 16.31     | -         | -         |
| Tx2        | yolo4 416	| 7.30	    | 7.58	    | 9.45	    | 9.90      | -         | -         |
| Tx2        | yolo4 512	| 5.96	    | 5.95	    | 7.22	    | 7.23      | -         | -         |
| Tx2        | yolo4 608	| 3.63	    | 3.65	    | 4.67	    | 4.70      | -         | -         |
| Nano       | yolo4 320	| 4.23	    | 4.55	    | 6.14	    | 6.53      | -         | -         |
| Nano       | yolo4 416	| 2.88	    | 3.00	    | 3.90	    | 4.04      | -         | -         |
| Nano       | yolo4 512	| 2.32	    | 2.34	    | 3.02	    | 3.04      | -         | -         |
| Nano       | yolo4 608	| 1.40	    | 1.41	    | 1.92	    | 1.93      | -         | -         |
Micaela Verucchi's avatar
Micaela Verucchi committed
54

Micaela Verucchi's avatar
Micaela Verucchi committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
## MAP Results
Results for COCO val 2017 (5k images), on RTX 2080Ti, with conf threshold=0.001

|                      | CodaLab       | CodaLab   | CodaLab       | CodaLab     | tkDNN map     | tkDNN map |
| -------------------- | :-----------: | :-------: | :-----------: | :---------: | :-----------: | :-------: |
|                      | **tkDNN**     | **tkDNN** | **darknet**   | **darknet** | **tkDNN**     | **tkDNN** |
|                      | MAP(0.5:0.95) | AP50      | MAP(0.5:0.95) | AP50        | MAP(0.5:0.95) | AP50      |
| Yolov3 (416x416)     | 0.381         | 0.675     | 0.380         | 0.675       | 0.372         | 0.663     |
| yolov4 (416x416)     | 0.468         | 0.705     | 0.471         | 0.710       | 0.459         | 0.695     |
| yolov3tiny (416x416) | 0.096         | 0.202     | 0.096         | 0.201       | 0.093         | 0.198     |
| yolov4tiny (416x416) | 0.202         | 0.400     | 0.201         | 0.400       | 0.197         | 0.395     |
| Cnet-dla34 (512x512) | 0.366         | 0.543     | \-            | \-          | 0.361         | 0.535     |
| mv2SSD (512x512)     | 0.226         | 0.381     | \-            | \-          | 0.223         | 0.378     |

Micaela Verucchi's avatar
Micaela Verucchi committed
69
## Index
Francesco Gatti's avatar
Francesco Gatti committed
70
71
72
73
74
75
76
77
78
79
80
- [tkDNN](#tkdnn)
  - [Index](#index)
  - [Dependencies](#dependencies)
  - [About OpenCV](#about-opencv)
  - [How to compile this repo](#how-to-compile-this-repo)
  - [Workflow](#workflow)
  - [How to export weights](#how-to-export-weights)
  - [Run the demo](#run-the-demo)
  - [mAP demo](#map-demo)
  - [Existing tests and supported networks](#existing-tests-and-supported-networks)
  - [References](#references)
perseusdg's avatar
perseusdg committed
81
  - [tkDNN on Windows 10 (experimental)](#tkdnn-on-windows-10-experimental)
hchandirasekar's avatar
hchandirasekar committed
82
  
Micaela Verucchi's avatar
Micaela Verucchi committed
83
84

## Dependencies
Micaela Verucchi's avatar
Micaela Verucchi committed
85
86
87
88
89
90
91
92
93
This branch works on every NVIDIA GPU that supports the following (latest tested) dependencies:
* CUDA 11.0 (or >= 10)
* cuDNN 8.0.4 (or >= 7.3)
* TensorRT 7.2.0 (or >=5)
* OpenCV 4.5.2 (or >=4)
* cmake 3.21 (or >= 3.15)
* yaml-cpp 0.5.2
* eigen3 3.3.4
* curl 7.58
Micaela Verucchi's avatar
Micaela Verucchi committed
94

Micaela Verucchi's avatar
Micaela Verucchi committed
95
96
```
sudo apt install libyaml-cpp-dev curl libeigen3-dev
Micaela Verucchi's avatar
Micaela Verucchi committed
97

Micaela Verucchi's avatar
Micaela Verucchi committed
98
```
Francesco Gatti's avatar
README  
Francesco Gatti committed
99

100
101
102
## About OpenCV
To compile and install OpenCV4 with contrib us the script ```install_OpenCV4.sh```. It will download and compile OpenCV in Download folder.
```
Micaela Verucchi's avatar
Micaela Verucchi committed
103
bash scripts/install_OpenCV4.sh
104
105
106
```
When using openCV not compiled with contrib, comment the definition of OPENCV_CUDACONTRIBCONTRIB in include/tkDNN/DetectionNN.h. When commented, the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency. 

Micaela Verucchi's avatar
Micaela Verucchi committed
107
## How to compile this repo
Francesco Gatti's avatar
Francesco Gatti committed
108
Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (3.15 or above). 
Francesco Gatti's avatar
README  
Francesco Gatti committed
109
```
Micaela Verucchi's avatar
Micaela Verucchi committed
110
git clone https://github.com/ceccocats/tkDNN
111
cd tkDNN
Francesco Gatti's avatar
README  
Francesco Gatti committed
112
113
mkdir build
cd build
Francesco Gatti's avatar
Francesco Gatti committed
114
cmake .. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
115
116
117
make
```

Micaela Verucchi's avatar
Micaela Verucchi committed
118
119
## Workflow
Steps needed to do inference on tkDNN with a custom neural network. 
Micaela Verucchi's avatar
Micaela Verucchi committed
120
* Build and train a NN model with your favorite framework.
Micaela Verucchi's avatar
Micaela Verucchi committed
121
122
123
124
* Export weights and bias for each layer and save them in a binary file (one for layer).
* Export outputs for each layer and save them in a binary file (one for layer).
* Create a new test and define the network, layer by layer using the weights extracted and the output to check the results. 
* Do inference.
Davide Sapienza's avatar
Davide Sapienza committed
125

126
## Exporting weights
Davide Sapienza's avatar
Davide Sapienza committed
127

Davide Sapienza's avatar
Davide Sapienza committed
128
For specific details on how to export weights see [here](./docs/exporting_weights.md)
Francesco Gatti's avatar
Francesco Gatti committed
129

Micaela Verucchi's avatar
Micaela Verucchi committed
130
## Run the demo 
Davide Sapienza's avatar
Davide Sapienza committed
131

Davide Sapienza's avatar
Davide Sapienza committed
132
For specific details on how to run the demos see [here](./docs/demo.md)
133

134
## mAP demo
Davide Sapienza's avatar
Davide Sapienza committed
135

Davide Sapienza's avatar
Davide Sapienza committed
136
For specific details on how to run the mAP demo see [here](./docs/mAP_demo.md)
137

Micaela Verucchi's avatar
Micaela Verucchi committed
138
## Existing tests and supported networks
Micaela Verucchi's avatar
Micaela Verucchi committed
139
140
141

| Test Name         | Network                                       | Dataset                                                       | N Classes | Input size    | Weights                                                                   |
| :---------------- | :-------------------------------------------- | :-----------------------------------------------------------: | :-------: | :-----------: | :------------------------------------------------------------------------ |
Micaela Verucchi's avatar
Micaela Verucchi committed
142
| yolo              | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 608x608       | [weights](https://cloud.hipert.unimore.it/s/nf4PJ3k8bxBETwL/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
143
144
145
| yolo_224          | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| yolo_berkeley     | YOLO v2<sup>1</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 416x736       | weights                                                                   |
| yolo_relu         | YOLO v2 (with ReLU, not Leaky)<sup>1</sup>    | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
146
| yolo_tiny         | YOLO v2 tiny<sup>1</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/m3orfJr8pGrN5mQ/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
147
| yolo_voc          | YOLO v2<sup>1</sup>                           | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/DJC5Fi2pEjfNDP9/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
148
| yolo3             | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/jPXmHyptpLoNdNR/download)     |
149
| yolo3_512   | YOLO v3<sup>2</sup>                                 | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/RGecMeGLD4cXEWL/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
150
| yolo3_berkeley    | YOLO v3<sup>2</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 320x544       | [weights](https://cloud.hipert.unimore.it/s/o5cHa4AjTKS64oD/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
151
152
| yolo3_coco4       | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 4         | 416x416       | [weights](https://cloud.hipert.unimore.it/s/o27NDzSAartbyc4/download)                                                                   |
| yolo3_flir        | YOLO v3<sup>2</sup>                           | [FREE FLIR](https://www.flir.com/oem/adas/adas-dataset-form/) | 3         | 320x544       | [weights](https://cloud.hipert.unimore.it/s/62DECncmF6bMMiH/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
153
| yolo3_tiny        | YOLO v3 tiny<sup>2</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/LMcSHtWaLeps8yN/download)     |
154
| yolo3_tiny512     | YOLO v3 tiny<sup>2</sup>                      | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/8Zt6bHwHADqP4JC/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
155
| dla34             | Deep Leayer Aggreagtion (DLA) 34<sup>3</sup>  | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
156
| dla34_cnet        | Centernet (DLA34 backend)<sup>4</sup>         | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/KRZBbCQsKAtQwpZ/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
157
| mobilenetv2ssd    | Mobilnet v2 SSD Lite<sup>5</sup>              | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 300x300       | [weights](https://cloud.hipert.unimore.it/s/x4ZfxBKN23zAJQp/download)     |
158
| mobilenetv2ssd512 | Mobilnet v2 SSD Lite<sup>5</sup>              | [COCO 2017](http://cocodataset.org/)                          | 81        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/pdCw2dYyHMJrcEM/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
159
| resnet101         | Resnet 101<sup>6</sup>                        | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
160
| resnet101_cnet    | Centernet (Resnet101 backend)<sup>4</sup>     | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/5BTjHMWBcJk8g3i/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
161
| csresnext50-panet-spp    | Cross Stage Partial Network <sup>7</sup>     | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/Kcs4xBozwY4wFx8/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
162
| yolo4             | Yolov4 <sup>8</sup>                           | [COCO 2017](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/d97CFzYqCPCp5Hg/download)     |
Francesco Gatti's avatar
Francesco Gatti committed
163
| yolo4_berkeley             | Yolov4 <sup>8</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)                          | 10        | 540x320       | [weights](https://cloud.hipert.unimore.it/s/nkWFa5fgb4NTdnB/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
164
| yolo4tiny             | Yolov4 tiny <sup>9</sup>                           | [COCO 2017](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/iRnc4pSqmx78gJs/download)     |
165
166
| yolo4x             | Yolov4x-mish  <sup>9</sup>                          | [COCO 2017](http://cocodataset.org/)                          | 80        | 640x640       | [weights](https://cloud.hipert.unimore.it/s/5MFjtNtgbDGdJEo/download)     |
| yolo4x-cps            | Scaled Yolov4 <sup>10</sup>                          | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/AfzHE4BfTeEm2gH/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
167

perseusdg's avatar
perseusdg committed
168
### tkDNN on Windows 10 (experimental)
hchandirasekar's avatar
hchandirasekar committed
169

Davide Sapienza's avatar
Davide Sapienza committed
170
For specific details on how to run the demos on Windows 10 see [here](./docs/windows.md)
Micaela Verucchi's avatar
Micaela Verucchi committed
171
172
173
174
175
176
177
178

## References

1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Micaela Verucchi's avatar
Micaela Verucchi committed
179
6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Micaela Verucchi's avatar
Micaela Verucchi committed
180
7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).
Micaela Verucchi's avatar
Micaela Verucchi committed
181
8. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv preprint arXiv:2004.10934 (2020).
Micaela Verucchi's avatar
Micaela Verucchi committed
182
9. Bochkovskiy, Alexey, "Yolo v4, v3 and v2 for Windows and Linux" (https://github.com/AlexeyAB/darknet)
183
10. Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "Scaled-YOLOv4: Scaling Cross Stage Partial Network." arXiv preprint arXiv:2011.08036 (2020).