README.md 14.5 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
2
tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier and several discrete GPU.
Micaela Verucchi's avatar
Micaela Verucchi committed
3
The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training. 
Francesco Gatti's avatar
README  
Francesco Gatti committed
4

Micaela Verucchi's avatar
Micaela Verucchi committed
5
6
## Index
* [Dependencies](#dependencies)
7
* [About OpenCV](#about-opencv)
Micaela Verucchi's avatar
Micaela Verucchi committed
8
9
10
11
12
13
14
15
16
17
18
19
20
* [How to compile this repo](#how-to-compile-this-repo)
* [Workflow](#workflow)
* [How to export weights](#how-to-export-weights)
* [Run the demo](#run-the-demo)
* [mAP demo](#map-demo)
* [Existing tests and supported networks](#existing-tests-and-supported-networks)
* [References](#references)




## Dependencies
This branch works on every NVIDIA GPU that supports the dependencies:
21
22
23
24
* CUDA 10.0
* CUDNN 7.603
* TENSORRT 6.01
* OPENCV 4.1
25
* yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)
Francesco Gatti's avatar
README  
Francesco Gatti committed
26

27
28
29
## About OpenCV
To compile and install OpenCV4 with contrib us the script ```install_OpenCV4.sh```. It will download and compile OpenCV in Download folder.
```
Micaela Verucchi's avatar
Micaela Verucchi committed
30
bash scripts/install_OpenCV4.sh
31
32
33
```
When using openCV not compiled with contrib, comment the definition of OPENCV_CUDACONTRIBCONTRIB in include/tkDNN/DetectionNN.h. When commented, the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency. 

Micaela Verucchi's avatar
Micaela Verucchi committed
34
35
## How to compile this repo
Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (1.15 or above). 
Francesco Gatti's avatar
README  
Francesco Gatti committed
36
```
Micaela Verucchi's avatar
Micaela Verucchi committed
37
git clone https://github.com/ceccocats/tkDNN
38
cd tkDNN
Micaela Verucchi's avatar
Micaela Verucchi committed
39
git checkout cnet
Francesco Gatti's avatar
README  
Francesco Gatti committed
40
41
mkdir build
cd build
Micaela Verucchi's avatar
Micaela Verucchi committed
42
cmake .. # use -DTEST_DATA=False to skip dataset download
Francesco Gatti's avatar
README  
Francesco Gatti committed
43
44
make
```
Micaela Verucchi's avatar
Micaela Verucchi committed
45
If TEST_DATA is not set to False, weights needed to run some tests will be automatically downloaded.
Francesco Gatti's avatar
README  
Francesco Gatti committed
46

Micaela Verucchi's avatar
Micaela Verucchi committed
47
48
## Workflow
Steps needed to do inference on tkDNN with a custom neural network. 
Micaela Verucchi's avatar
Micaela Verucchi committed
49
* Build and train a NN model with your favorite framework.
Micaela Verucchi's avatar
Micaela Verucchi committed
50
51
52
53
* Export weights and bias for each layer and save them in a binary file (one for layer).
* Export outputs for each layer and save them in a binary file (one for layer).
* Create a new test and define the network, layer by layer using the weights extracted and the output to check the results. 
* Do inference.
Davide Sapienza's avatar
Davide Sapienza committed
54

Micaela Verucchi's avatar
Micaela Verucchi committed
55
56
57
## How to export weights

Weights are essential for any network to run inference. For each test a folder organized as follow is needed:
Davide Sapienza's avatar
Davide Sapienza committed
58
```
Micaela Verucchi's avatar
Micaela Verucchi committed
59
60
61
62
    test_nn
        |---- test_nn.cpp (nn definition in tkDNN)
        |---- layers/ (folder containing a binary file for each layer with the corresponding wieghts and bias)
        |---- debug/  (folder containing a binary file for each layer with the corresponding outputs)
Davide Sapienza's avatar
Davide Sapienza committed
63
```
Micaela Verucchi's avatar
Micaela Verucchi committed
64
Therefore, once the weights have been exported, the folders layers and debug should be placed in the corresponding test.
Davide Sapienza's avatar
Davide Sapienza committed
65

Micaela Verucchi's avatar
Micaela Verucchi committed
66
### 1)Export weights from darknet
Micaela Verucchi's avatar
Micaela Verucchi committed
67
To export weights for NNs that are defined in darknet framework, use [this](https://github.com/ceccocats/darknet) fork of darknet and follow these steps to obtain a correct debug and layers folder, ready for tkDNN.
Davide Sapienza's avatar
Davide Sapienza committed
68
69

```
Micaela Verucchi's avatar
Micaela Verucchi committed
70
git clone https://github.com/ceccocats/darknet
71
cd darknet
Micaela Verucchi's avatar
Micaela Verucchi committed
72
73
74
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers
Davide Sapienza's avatar
Davide Sapienza committed
75
```
Micaela Verucchi's avatar
Micaela Verucchi committed
76
N.b. Use compilation with CPU (leave GPU=0 in Makefile) if you also want debug. 
Davide Sapienza's avatar
Davide Sapienza committed
77

Micaela Verucchi's avatar
Micaela Verucchi committed
78
79
### 2)Export weights for DLA34 and ResNet101 
To get weights and outputs needed to run the tests dla34 and resnet101 use the Python script and the Anaconda environment included in the repository.   
Davide Sapienza's avatar
Davide Sapienza committed
80

Micaela Verucchi's avatar
Micaela Verucchi committed
81
Create Anaconda environment and activate it:
Francesco Gatti's avatar
Francesco Gatti committed
82
```
Micaela Verucchi's avatar
Micaela Verucchi committed
83
84
85
conda env create -f file_name.yml
source activate env_name
python <script name>
Francesco Gatti's avatar
Francesco Gatti committed
86
```
Micaela Verucchi's avatar
Micaela Verucchi committed
87
88
### 3)Export weights for CenterNet
To get the weights needed to run Centernet tests use [this](https://github.com/sapienzadavide/CenterNet.git) fork of the original Centernet. 
Francesco Gatti's avatar
Francesco Gatti committed
89
```
Micaela Verucchi's avatar
Micaela Verucchi committed
90
git clone https://github.com/sapienzadavide/CenterNet.git
Francesco Gatti's avatar
Francesco Gatti committed
91
```
Micaela Verucchi's avatar
Micaela Verucchi committed
92
* follow the instruction in the README.md and INSTALL.md
Davide Sapienza's avatar
Davide Sapienza committed
93
94

```
Micaela Verucchi's avatar
Micaela Verucchi committed
95
96
python demo.py --input_res 512 --arch resdcn_101 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_resdcn101.pth --exp_wo --exp_wo_dim 512
python demo.py --input_res 512 --arch dla_34 ctdet --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/ctdet_coco_dla_2x.pth --exp_wo --exp_wo_dim 512
Davide Sapienza's avatar
Davide Sapienza committed
97
```
Micaela Verucchi's avatar
Micaela Verucchi committed
98
99
### 4)Export weights for MobileNetSSD

Micaela Verucchi's avatar
Micaela Verucchi committed
100
To get the weights needed to run Mobilenet tests use [this](https://github.com/mive93/pytorch-ssd) fork of a Pytorch implementation of SSD network. 
Davide Sapienza's avatar
Davide Sapienza committed
101
102

```
Micaela Verucchi's avatar
Micaela Verucchi committed
103
104
105
106
git clone https://github.com/mive93/pytorch-ssd
cd pytorch-ssd
conda env create -f env_mobv2ssd.yml
python run_ssd_live_demo.py mb2-ssd-lite <pth-model-fil> <labels-file>
Davide Sapienza's avatar
Davide Sapienza committed
107
```
Micaela Verucchi's avatar
Micaela Verucchi committed
108
## Run the demo
Davide Sapienza's avatar
Davide Sapienza committed
109

Micaela Verucchi's avatar
Micaela Verucchi committed
110
To run the an object detection demo follow these steps (example with yolov3):
Davide Sapienza's avatar
Davide Sapienza committed
111
```
Davide Sapienza's avatar
Davide Sapienza committed
112
rm yolo3_FP32.rt        # be sure to delete(or move) old tensorRT files
Micaela Verucchi's avatar
Micaela Verucchi committed
113
./test_yolo3            # run the yolo test (is slow)
Davide Sapienza's avatar
Davide Sapienza committed
114
./demo yolo3_FP32.rt ../demo/yolo_test.mp4 y
Davide Sapienza's avatar
Davide Sapienza committed
115
```
Micaela Verucchi's avatar
Micaela Verucchi committed
116
In general the demo program takes 3 parameters:
Davide Sapienza's avatar
Davide Sapienza committed
117
```
Micaela Verucchi's avatar
Micaela Verucchi committed
118
./demo <network-rt-file> <path-to-video> <kind-of-network>
119
```
Micaela Verucchi's avatar
Micaela Verucchi committed
120
121
122
123
where
*  ```<network-rt-file>``` is the rt file generated by a test
*  ```<<path-to-video>``` is the path to a video file or a camera input  
*  ```<kind-of-network>``` is the type of network. Thee types are currently supported: ```y``` (YOLO family), ```c``` (CenterNet family) and ```m``` (MobileNet-SSD family)
Davide Sapienza's avatar
Davide Sapienza committed
124
N.b. By default it is used FP32 inference
Micaela Verucchi's avatar
Micaela Verucchi committed
125
126

![demo](https://user-images.githubusercontent.com/11562617/72547657-540e7800-388d-11ea-83c6-49dfea2a0607.gif)
127

Davide Sapienza's avatar
Davide Sapienza committed
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
### FP16 inference

To run the an object detection demo with FP16 inference follow these steps (example with yolov3):
```
export TKDNN_MODE=FP16  # set the half floating point optimization
rm yolo3_FP16.rt        # be sure to delete(or move) old tensorRT files
./test_yolo3            # run the yolo test (is slow)
./demo yolo3_FP16.rt ../demo/yolo_test.mp4 y
```
N.b. Using FP16 inference will lead to some errors in the results (first or second decimal). 

### INT8 inference

To run the an object detection demo with INT8 inference follow these steps (example with yolov3):
```
export TKDNN_MODE=INT8  # set the 8-bit integer optimization

# image_list.txt contains the list of the absolute paths to the calibration images
export TKDNN_CALIB_IMG_PATH=/path/to/calibration/image_list.txt

# label_list.txt contains the list of the absolute paths to the calibration labels
export TKDNN_CALIB_LABEL_PATH=/path/to/calibration/label_list.txt
rm yolo3_INT8.rt        # be sure to delete(or move) old tensorRT files
./test_yolo3            # run the yolo test (is slow)
./demo yolo3_INT8.rt ../demo/yolo_test.mp4 y
```
154
N.b. Using INT8 inference will lead to some errors in the results. 
Davide Sapienza's avatar
Davide Sapienza committed
155

156
157
158
N.b. The test will be slower: this is due to the INT8 calibration, which may take some time to complete. 

N.b. INT8 calibration requires TensorRT version greater than or equal to 6.0
Davide Sapienza's avatar
Davide Sapienza committed
159

160
## mAP demo
Davide Sapienza's avatar
Davide Sapienza committed
161

162
163
To compute mAP, precision, recall and f1score, run the map_demo.

xavier's avatar
xavier committed
164
165
A validation set is needed. To download COCO_val2017 run (form the root folder): 
```
Micaela Verucchi's avatar
Micaela Verucchi committed
166
bash scripts/download_validation.sh 
xavier's avatar
xavier committed
167
168
```
To compute the map, the following parameters are needed:
169
```
Micaela Verucchi's avatar
Micaela Verucchi committed
170
./map_demo <network rt> <network type [y|c|m]> <labels file path> <config file path>
171
172
```
where 
Micaela Verucchi's avatar
Micaela Verucchi committed
173
* ```<network rt>```: rt file of a chosen network on which compute the mAP.
Micaela Verucchi's avatar
Micaela Verucchi committed
174
* ```<network type [y|c|m]>```: type of network. Right now only y(yolo), c(centernet) and m(mobilenet) are allowed
Micaela Verucchi's avatar
Micaela Verucchi committed
175
* ```<labels file path>```: path to a text file containing all the paths of the ground-truth labels. It is important that all the labels of the ground-truth are in a folder called 'labels'. In the folder containing the folder 'labels' there should be also a folder 'images', containing all the ground-truth images having the same same as the labels. To better understand, if there is a label path/to/labels/000001.txt there should be a corresponding image path/to/images/000001.jpg. 
Micaela Verucchi's avatar
Micaela Verucchi committed
176
* ```<config file path>```: path to a yaml file with the parameters needed for the mAP computation, similar to demo/config.yaml
177
178
179
180

Example:

```
xavier's avatar
xavier committed
181
cd build
Davide Sapienza's avatar
Davide Sapienza committed
182
./map_demo dla34_cnet_FP32.rt c ../demo/COCO_val2017/all_labels.txt ../demo/config.yaml
xavier's avatar
xavier committed
183
```
Micaela Verucchi's avatar
Micaela Verucchi committed
184

Micaela Verucchi's avatar
Micaela Verucchi committed
185
## Existing tests and supported networks
Micaela Verucchi's avatar
Micaela Verucchi committed
186
187
188

| Test Name         | Network                                       | Dataset                                                       | N Classes | Input size    | Weights                                                                   |
| :---------------- | :-------------------------------------------- | :-----------------------------------------------------------: | :-------: | :-----------: | :------------------------------------------------------------------------ |
Micaela Verucchi's avatar
Micaela Verucchi committed
189
| yolo              | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 608x608       | [weights](https://cloud.hipert.unimore.it/s/nf4PJ3k8bxBETwL/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
190
191
192
| yolo_224          | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| yolo_berkeley     | YOLO v2<sup>1</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 416x736       | weights                                                                   |
| yolo_relu         | YOLO v2 (with ReLU, not Leaky)<sup>1</sup>    | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
193
| yolo_tiny         | YOLO v2 tiny<sup>1</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/m3orfJr8pGrN5mQ/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
194
| yolo_voc          | YOLO v2<sup>1</sup>                           | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/DJC5Fi2pEjfNDP9/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
195
| yolo3             | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/jPXmHyptpLoNdNR/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
196
| yolo3_512   | YOLO v3<sup>2</sup>                                 | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/e7HfScx77JEHeYb/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
197
| yolo3_berkeley    | YOLO v3<sup>2</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 320x544       | [weights](https://cloud.hipert.unimore.it/s/o5cHa4AjTKS64oD/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
198
199
| yolo3_coco4       | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 4         | 416x416       | [weights](https://cloud.hipert.unimore.it/s/o27NDzSAartbyc4/download)                                                                   |
| yolo3_flir        | YOLO v3<sup>2</sup>                           | [FREE FLIR](https://www.flir.com/oem/adas/adas-dataset-form/) | 3         | 320x544       | [weights](https://cloud.hipert.unimore.it/s/62DECncmF6bMMiH/download)                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
200
| yolo3_tiny        | YOLO v3 tiny<sup>2</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/LMcSHtWaLeps8yN/download)     |
201
| yolo3_tiny512     | YOLO v3 tiny<sup>2</sup>                      | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/8Zt6bHwHADqP4JC/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
202
| dla34             | Deep Leayer Aggreagtion (DLA) 34<sup>3</sup>  | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
203
| dla34_cnet        | Centernet (DLA34 backend)<sup>4</sup>         | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/KRZBbCQsKAtQwpZ/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
204
| mobilenetv2ssd    | Mobilnet v2 SSD Lite<sup>5</sup>              | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 300x300       | [weights](https://cloud.hipert.unimore.it/s/x4ZfxBKN23zAJQp/download)     |
205
| mobilenetv2ssd512 | Mobilnet v2 SSD Lite<sup>5</sup>              | [COCO 2017](http://cocodataset.org/)                          | 81        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/pdCw2dYyHMJrcEM/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
206
| resnet101         | Resnet 101<sup>6</sup>                        | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
Micaela Verucchi's avatar
Micaela Verucchi committed
207
| resnet101_cnet    | Centernet (Resnet101 backend)<sup>4</sup>     | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/5BTjHMWBcJk8g3i/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
208
| csresnext50-panet-spp    | Cross Stage Partial Network <sup>7</sup>     | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/Kcs4xBozwY4wFx8/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
209
210
211
212
213
214
215
216
217


## References

1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Micaela Verucchi's avatar
Micaela Verucchi committed
218
6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Micaela Verucchi's avatar
Micaela Verucchi committed
219
7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).