README.md 15 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
2
tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier and several discrete GPU.
Micaela Verucchi's avatar
Micaela Verucchi committed
3
The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
4

Micaela Verucchi's avatar
Micaela Verucchi committed
5
## Index
Francesco Gatti's avatar
Francesco Gatti committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- [tkDNN](#tkdnn)
  - [Index](#index)
  - [Dependencies](#dependencies)
  - [About OpenCV](#about-opencv)
  - [How to compile this repo](#how-to-compile-this-repo)
  - [Workflow](#workflow)
  - [How to export weights](#how-to-export-weights)
    - [1)Export weights from darknet](#1export-weights-from-darknet)
    - [2)Export weights for DLA34 and ResNet101](#2export-weights-for-dla34-and-resnet101)
    - [3)Export weights for CenterNet](#3export-weights-for-centernet)
    - [4)Export weights for MobileNetSSD](#4export-weights-for-mobilenetssd)
  - [Run the demo](#run-the-demo)
    - [FP16 inference](#fp16-inference)
    - [INT8 inference](#int8-inference)
  - [mAP demo](#map-demo)
  - [Existing tests and supported networks](#existing-tests-and-supported-networks)
  - [References](#references)
Micaela Verucchi's avatar
Micaela Verucchi committed
23
24
25
26
27
28




## Dependencies
This branch works on every NVIDIA GPU that supports the dependencies:
29
30
31
* CUDA 10.0
* CUDNN 7.603
* TENSORRT 6.01
32
* OPENCV 3.4
33
* yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)
Francesco Gatti's avatar
README  
Francesco Gatti committed
34

35
36
37
## About OpenCV
To compile and install OpenCV4 with contrib us the script ```install_OpenCV4.sh```. It will download and compile OpenCV in Download folder.
```
Micaela Verucchi's avatar
Micaela Verucchi committed
38
bash scripts/install_OpenCV4.sh
39
40
41
```
When using openCV not compiled with contrib, comment the definition of OPENCV_CUDACONTRIBCONTRIB in include/tkDNN/DetectionNN.h. When commented, the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency. 

Micaela Verucchi's avatar
Micaela Verucchi committed
42
## How to compile this repo
Francesco Gatti's avatar
Francesco Gatti committed
43
Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (3.15 or above). 
Francesco Gatti's avatar
README  
Francesco Gatti committed
44
```
Micaela Verucchi's avatar
Micaela Verucchi committed
45
git clone https://github.com/ceccocats/tkDNN
46
cd tkDNN
Francesco Gatti's avatar
README  
Francesco Gatti committed
47
48
mkdir build
cd build
Francesco Gatti's avatar
Francesco Gatti committed
49
cmake .. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
50
51
52
make
```

Micaela Verucchi's avatar
Micaela Verucchi committed
53
54
## Workflow
Steps needed to do inference on tkDNN with a custom neural network. 
Micaela Verucchi's avatar
Micaela Verucchi committed
55
* Build and train a NN model with your favorite framework.
Micaela Verucchi's avatar
Micaela Verucchi committed
56
57
58
59
* Export weights and bias for each layer and save them in a binary file (one for layer).
* Export outputs for each layer and save them in a binary file (one for layer).
* Create a new test and define the network, layer by layer using the weights extracted and the output to check the results. 
* Do inference.
Davide Sapienza's avatar
Davide Sapienza committed
60

Micaela Verucchi's avatar
Micaela Verucchi committed
61
62
## How to export weights

63
Weights are essential for any network to run inference. For each test a folder organized as follow is needed (in the build folder):
Davide Sapienza's avatar
Davide Sapienza committed
64
```
Micaela Verucchi's avatar
Micaela Verucchi committed
65
66
67
    test_nn
        |---- layers/ (folder containing a binary file for each layer with the corresponding wieghts and bias)
        |---- debug/  (folder containing a binary file for each layer with the corresponding outputs)
Davide Sapienza's avatar
Davide Sapienza committed
68
```
Micaela Verucchi's avatar
Micaela Verucchi committed
69
Therefore, once the weights have been exported, the folders layers and debug should be placed in the corresponding test.
Davide Sapienza's avatar
Davide Sapienza committed
70

Micaela Verucchi's avatar
Micaela Verucchi committed
71
### 1)Export weights from darknet
Micaela Verucchi's avatar
Micaela Verucchi committed
72
To export weights for NNs that are defined in darknet framework, use [this](https://github.com/ceccocats/darknet) fork of darknet and follow these steps to obtain a correct debug and layers folder, ready for tkDNN.
Davide Sapienza's avatar
Davide Sapienza committed
73
74

```
Micaela Verucchi's avatar
Micaela Verucchi committed
75
git clone https://github.com/ceccocats/darknet
76
cd darknet
Micaela Verucchi's avatar
Micaela Verucchi committed
77
78
79
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers
Davide Sapienza's avatar
Davide Sapienza committed
80
```
Micaela Verucchi's avatar
Micaela Verucchi committed
81
N.b. Use compilation with CPU (leave GPU=0 in Makefile) if you also want debug. 
Davide Sapienza's avatar
Davide Sapienza committed
82

Micaela Verucchi's avatar
Micaela Verucchi committed
83
84
### 2)Export weights for DLA34 and ResNet101 
To get weights and outputs needed to run the tests dla34 and resnet101 use the Python script and the Anaconda environment included in the repository.   
Davide Sapienza's avatar
Davide Sapienza committed
85

Micaela Verucchi's avatar
Micaela Verucchi committed
86
Create Anaconda environment and activate it:
Francesco Gatti's avatar
Francesco Gatti committed
87
```
Micaela Verucchi's avatar
Micaela Verucchi committed
88
89
90
conda env create -f file_name.yml
source activate env_name
python <script name>
Francesco Gatti's avatar
Francesco Gatti committed
91
```
Micaela Verucchi's avatar
Micaela Verucchi committed
92
93
### 3)Export weights for CenterNet
To get the weights needed to run Centernet tests use [this](https://github.com/sapienzadavide/CenterNet.git) fork of the original Centernet. 
Francesco Gatti's avatar
Francesco Gatti committed
94
```
Micaela Verucchi's avatar
Micaela Verucchi committed
95
git clone https://github.com/sapienzadavide/CenterNet.git
Francesco Gatti's avatar
Francesco Gatti committed
96
```
Micaela Verucchi's avatar
Micaela Verucchi committed
97
* follow the instruction in the README.md and INSTALL.md
Davide Sapienza's avatar
Davide Sapienza committed
98
99

```
Micaela Verucchi's avatar
Micaela Verucchi committed
100
101
python demo.py --input_res 512 --arch resdcn_101 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_resdcn101.pth --exp_wo --exp_wo_dim 512
python demo.py --input_res 512 --arch dla_34 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_dla_2x.pth --exp_wo --exp_wo_dim 512
Davide Sapienza's avatar
Davide Sapienza committed
102
```
Micaela Verucchi's avatar
Micaela Verucchi committed
103
104
### 4)Export weights for MobileNetSSD

Micaela Verucchi's avatar
Micaela Verucchi committed
105
To get the weights needed to run Mobilenet tests use [this](https://github.com/mive93/pytorch-ssd) fork of a Pytorch implementation of SSD network. 
Davide Sapienza's avatar
Davide Sapienza committed
106
107

```
Micaela Verucchi's avatar
Micaela Verucchi committed
108
109
110
111
git clone https://github.com/mive93/pytorch-ssd
cd pytorch-ssd
conda env create -f env_mobv2ssd.yml
python run_ssd_live_demo.py mb2-ssd-lite <pth-model-fil> <labels-file>
Davide Sapienza's avatar
Davide Sapienza committed
112
```
Micaela Verucchi's avatar
Micaela Verucchi committed
113
## Run the demo
Davide Sapienza's avatar
Davide Sapienza committed
114

Micaela Verucchi's avatar
Micaela Verucchi committed
115
To run the an object detection demo follow these steps (example with yolov3):
Davide Sapienza's avatar
Davide Sapienza committed
116
```
Davide Sapienza's avatar
Davide Sapienza committed
117
rm yolo3_FP32.rt        # be sure to delete(or move) old tensorRT files
Micaela Verucchi's avatar
Micaela Verucchi committed
118
./test_yolo3            # run the yolo test (is slow)
Davide Sapienza's avatar
Davide Sapienza committed
119
./demo yolo3_FP32.rt ../demo/yolo_test.mp4 y
Davide Sapienza's avatar
Davide Sapienza committed
120
```
121
In general the demo program takes 4 parameters:
Davide Sapienza's avatar
Davide Sapienza committed
122
```
123
./demo <network-rt-file> <path-to-video> <kind-of-network> <number-of-classes>
124
```
Micaela Verucchi's avatar
Micaela Verucchi committed
125
126
127
128
where
*  ```<network-rt-file>``` is the rt file generated by a test
*  ```<<path-to-video>``` is the path to a video file or a camera input  
*  ```<kind-of-network>``` is the type of network. Thee types are currently supported: ```y``` (YOLO family), ```c``` (CenterNet family) and ```m``` (MobileNet-SSD family)
129
*  ```<number-of-classes>```is the number of classes the network is trained on
Davide Sapienza's avatar
Davide Sapienza committed
130
N.b. By default it is used FP32 inference
Micaela Verucchi's avatar
Micaela Verucchi committed
131
132

![demo](https://user-images.githubusercontent.com/11562617/72547657-540e7800-388d-11ea-83c6-49dfea2a0607.gif)
133

Davide Sapienza's avatar
Davide Sapienza committed
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
### FP16 inference

To run the an object detection demo with FP16 inference follow these steps (example with yolov3):
```
export TKDNN_MODE=FP16  # set the half floating point optimization
rm yolo3_FP16.rt        # be sure to delete(or move) old tensorRT files
./test_yolo3            # run the yolo test (is slow)
./demo yolo3_FP16.rt ../demo/yolo_test.mp4 y
```
N.b. Using FP16 inference will lead to some errors in the results (first or second decimal). 

### INT8 inference

To run the an object detection demo with INT8 inference follow these steps (example with yolov3):
```
export TKDNN_MODE=INT8  # set the 8-bit integer optimization

# image_list.txt contains the list of the absolute paths to the calibration images
export TKDNN_CALIB_IMG_PATH=/path/to/calibration/image_list.txt

# label_list.txt contains the list of the absolute paths to the calibration labels
export TKDNN_CALIB_LABEL_PATH=/path/to/calibration/label_list.txt
rm yolo3_INT8.rt        # be sure to delete(or move) old tensorRT files
./test_yolo3            # run the yolo test (is slow)
./demo yolo3_INT8.rt ../demo/yolo_test.mp4 y
```
160
N.b. Using INT8 inference will lead to some errors in the results. 
Davide Sapienza's avatar
Davide Sapienza committed
161

162
163
164
N.b. The test will be slower: this is due to the INT8 calibration, which may take some time to complete. 

N.b. INT8 calibration requires TensorRT version greater than or equal to 6.0
Davide Sapienza's avatar
Davide Sapienza committed
165

166
167
168
169
170
### BatchSize bigger than 1
```
export TKDNN_BATCHSIZE=2
```

171
## mAP demo
Davide Sapienza's avatar
Davide Sapienza committed
172

173
174
To compute mAP, precision, recall and f1score, run the map_demo.

175
176
A validation set is needed. 
To download COCO_val2017 (80 classes) run (form the root folder): 
xavier's avatar
xavier committed
177
```
178
bash scripts/download_validation.sh COCO
xavier's avatar
xavier committed
179
```
180
181
182
183
184
To download Berkeley_val (10 classes) run (form the root folder): 
```
bash scripts/download_validation.sh BDD
```

xavier's avatar
xavier committed
185
To compute the map, the following parameters are needed:
186
```
Micaela Verucchi's avatar
Micaela Verucchi committed
187
./map_demo <network rt> <network type [y|c|m]> <labels file path> <config file path>
188
189
```
where 
Micaela Verucchi's avatar
Micaela Verucchi committed
190
* ```<network rt>```: rt file of a chosen network on which compute the mAP.
Micaela Verucchi's avatar
Micaela Verucchi committed
191
* ```<network type [y|c|m]>```: type of network. Right now only y(yolo), c(centernet) and m(mobilenet) are allowed
Micaela Verucchi's avatar
Micaela Verucchi committed
192
* ```<labels file path>```: path to a text file containing all the paths of the ground-truth labels. It is important that all the labels of the ground-truth are in a folder called 'labels'. In the folder containing the folder 'labels' there should be also a folder 'images', containing all the ground-truth images having the same same as the labels. To better understand, if there is a label path/to/labels/000001.txt there should be a corresponding image path/to/images/000001.jpg. 
Micaela Verucchi's avatar
Micaela Verucchi committed
193
* ```<config file path>```: path to a yaml file with the parameters needed for the mAP computation, similar to demo/config.yaml
194
195
196
197

Example:

```
xavier's avatar
xavier committed
198
cd build
Davide Sapienza's avatar
Davide Sapienza committed
199
./map_demo dla34_cnet_FP32.rt c ../demo/COCO_val2017/all_labels.txt ../demo/config.yaml
xavier's avatar
xavier committed
200
```
Micaela Verucchi's avatar
Micaela Verucchi committed
201

Micaela Verucchi's avatar
Micaela Verucchi committed
202
## Existing tests and supported networks
Micaela Verucchi's avatar
Micaela Verucchi committed
203
204
205

| Test Name         | Network                                       | Dataset                                                       | N Classes | Input size    | Weights                                                                   |
| :---------------- | :-------------------------------------------- | :-----------------------------------------------------------: | :-------: | :-----------: | :------------------------------------------------------------------------ |
Micaela Verucchi's avatar
Micaela Verucchi committed
206
| yolo              | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 608x608       | [weights](https://cloud.hipert.unimore.it/s/nf4PJ3k8bxBETwL/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
207
208
209
| yolo_224          | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| yolo_berkeley     | YOLO v2<sup>1</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 416x736       | weights                                                                   |
| yolo_relu         | YOLO v2 (with ReLU, not Leaky)<sup>1</sup>    | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
210
| yolo_tiny         | YOLO v2 tiny<sup>1</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/m3orfJr8pGrN5mQ/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
211
| yolo_voc          | YOLO v2<sup>1</sup>                           | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/DJC5Fi2pEjfNDP9/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
212
| yolo3             | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/jPXmHyptpLoNdNR/download)     |
213
| yolo3_512   | YOLO v3<sup>2</sup>                                 | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/RGecMeGLD4cXEWL/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
214
| yolo3_berkeley    | YOLO v3<sup>2</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 320x544       | [weights](https://cloud.hipert.unimore.it/s/o5cHa4AjTKS64oD/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
215
216
| yolo3_coco4       | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 4         | 416x416       | [weights](https://cloud.hipert.unimore.it/s/o27NDzSAartbyc4/download)                                                                   |
| yolo3_flir        | YOLO v3<sup>2</sup>                           | [FREE FLIR](https://www.flir.com/oem/adas/adas-dataset-form/) | 3         | 320x544       | [weights](https://cloud.hipert.unimore.it/s/62DECncmF6bMMiH/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
217
| yolo3_tiny        | YOLO v3 tiny<sup>2</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/LMcSHtWaLeps8yN/download)     |
218
| yolo3_tiny512     | YOLO v3 tiny<sup>2</sup>                      | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/8Zt6bHwHADqP4JC/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
219
| dla34             | Deep Leayer Aggreagtion (DLA) 34<sup>3</sup>  | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
220
| dla34_cnet        | Centernet (DLA34 backend)<sup>4</sup>         | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/KRZBbCQsKAtQwpZ/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
221
| mobilenetv2ssd    | Mobilnet v2 SSD Lite<sup>5</sup>              | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 300x300       | [weights](https://cloud.hipert.unimore.it/s/x4ZfxBKN23zAJQp/download)     |
222
| mobilenetv2ssd512 | Mobilnet v2 SSD Lite<sup>5</sup>              | [COCO 2017](http://cocodataset.org/)                          | 81        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/pdCw2dYyHMJrcEM/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
223
| resnet101         | Resnet 101<sup>6</sup>                        | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
224
| resnet101_cnet    | Centernet (Resnet101 backend)<sup>4</sup>     | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/5BTjHMWBcJk8g3i/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
225
| csresnext50-panet-spp    | Cross Stage Partial Network <sup>7</sup>     | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/Kcs4xBozwY4wFx8/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
226
227
228
229
230
231
232
233
234


## References

1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Micaela Verucchi's avatar
Micaela Verucchi committed
235
6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Micaela Verucchi's avatar
Micaela Verucchi committed
236
7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).