README.md 10.8 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
Francesco Gatti's avatar
Francesco Gatti committed
2
tkDNN is a Deep Neural Network library built with cuDNN primitives specifically thought to work on NVIDIA TK1(and all successive) board.<br>
Francesco Gatti's avatar
README  
Francesco Gatti committed
3
4
The main scope is to do high performance inference on already trained models.

Francesco Gatti's avatar
Francesco Gatti committed
5
this branch actually work on every NVIDIA GPU that support the dependencies:
6
7
8
9
* CUDA 10.0
* CUDNN 7.603
* TENSORRT 6.01
* OPENCV 4.1
10
* yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)
Francesco Gatti's avatar
README  
Francesco Gatti committed
11
12
13
14
15
16
17
18
19
20
21
22
23
24

## Workflow
The recommended workflow follow these step:
* Build and train a model in Keras (on any PC)
* Export weights and bias 
* Define the model on tkDNN
* Do inference (on TK1)

## Compile the library
Build with cmake
```
mkdir build
cd build
cmake ..
Francesco Gatti's avatar
Francesco Gatti committed
25
# use -DTEST_DATA=False to skip dataset download
Francesco Gatti's avatar
README  
Francesco Gatti committed
26
27
make
```
Francesco Gatti's avatar
Francesco Gatti committed
28
29
during the cmake configuration it will be dowloaded the weights needed for running
the tests
Francesco Gatti's avatar
README  
Francesco Gatti committed
30

Davide Sapienza's avatar
Davide Sapienza committed
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
## DLA34 and ResNet101 weights
To get weights and outputs needed for running the tests you can use the Python 
script and the Anaconda environment included in the repository.   

Create Anaconda environment and activate it:
```
conda env create -f file_name.yml
source activate env_name 
```
Run the Python script inside the environment.

## CenterNet weights
To get the weights needed for running the tests:

* clone the forked repository by the original CenterNet:
```
git clone https://github.com/sapienzadavide/CenterNet.git
```
* follow the instruction in the README.md and INSTALL.md
* copy the weigths and outputs from /path/to/CenterNet/src/ in ./test/centernet-path/ . For example:
```
cp /path/to/CenterNet/src/layers_dla/* ./test/dla34_cnet/layers/
cp /path/to/CenterNet/src/debug_dla/* ./test/dla34_cnet/debug/
```
or
```
cp /path/to/CenterNet/src/layers_resdcn/* ./test/resnet101_cnet/layers/
cp /path/to/CenterNet/src/debug_resdcn/* ./test/resnet101_cnet/debug/
```

Francesco Gatti's avatar
README  
Francesco Gatti committed
61
## Test
Francesco Gatti's avatar
Francesco Gatti committed
62
63
64
65
66
67
Assumiung you have correctly builded the library these are the test ready to exec:
* test_simple: a simple convolutional and dense network (CUDNN only)
* test_mnist: the famous mnist netwok (CUDNN and TENSORRT)
* test_mnistRT: the mnist network hardcoded in using tensorRT apis (TENSORRT only)
* test_yolo: YOLO detection network (CUDNN and TENSORRT)
* test_yolo_tiny: smaller version of YOLO (CUDNN and TENSRRT)
Francesco Gatti's avatar
Francesco Gatti committed
68
* test_yolo3_berkeley: our yolo3 version trained with BDD100K dateset 
Davide Sapienza's avatar
Davide Sapienza committed
69
70
71
72
73
* test_resnet101: ResNet101 network (CUDNN and TENSORRT)
* test_resnet101_cnet: CenterNet detection based on ResNet101 (CUDNN and TENSORRT)
* test_dla34: DLA34 network (CUDNN and TENSORRT)
* test_dla34_cnet: CenterNet detection based on DLA34 (CUDNN and TENSORRT)

Francesco Gatti's avatar
README  
Francesco Gatti committed
74

Francesco Gatti's avatar
Francesco Gatti committed
75
## yolo3 berkeley demo detection
Francesco Gatti's avatar
Francesco Gatti committed
76
77
78
For the live detection you need to precompile the tensorRT file by luncing the desidered network test, this is the recommended process:
```
export TKDNN_MODE=FP16   # set the half floating point optimization
Francesco Gatti's avatar
Francesco Gatti committed
79
80
rm yolo3_berkeley.rt		 # be sure to delete(or move) old tensorRT files
./test_yolo3_berkeley              # run the yolo test (is slow)
Francesco Gatti's avatar
Francesco Gatti committed
81
82
# with f16 inference the result will be a bit incorrect
```
Francesco Gatti's avatar
Francesco Gatti committed
83
this will genereate a yolo3_berkeley.rt file that can be used for live detection:
Francesco Gatti's avatar
Francesco Gatti committed
84
```
Davide Sapienza's avatar
Davide Sapienza committed
85
86
./demo                                 # launch detection on a demo video
./demo yolo3_berkeley.rt /dev/video0 y # launch detection on device 0
Francesco Gatti's avatar
Francesco Gatti committed
87
```
Francesco Gatti's avatar
Francesco Gatti committed
88
![demo](https://user-images.githubusercontent.com/11562617/72547657-540e7800-388d-11ea-83c6-49dfea2a0607.gif)
Davide Sapienza's avatar
Davide Sapienza committed
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112


## CenterNet (DLA34, ResNet101) demo detection
For the live detection you need to precompile the tensorRT file by luncing the desidered network test, this is the recommended process:
```
export TKDNN_MODE=FP16   # set the half floating point optimization
```

For CenterNet based on ResNet101:
```
rm resnet101_cnet.rt		 # be sure to delete(or move) old tensorRT files
./test_resnet101_cnet              # run the yolo test (is slow)
# with f16 inference the result will be a bit incorrect
```

For CenterNet based on DLA34:
```
rm dla34_cnet.rt		     # be sure to delete(or move) old tensorRT files
./test_dla34_cnet                  # run the yolo test (is slow)
# with f16 inference the result will be a bit incorrect
```

this will genereate resnet101_cnet.rt and dla34_cnet.rt file that can be used for live detection:
```
Davide Sapienza's avatar
Davide Sapienza committed
113
114
115
./demo dla34_cnet.rt ../demo/yolo_test.mp4 c    # launch detection on a demo video
./demo resnet101_cnet.rt /dev/video0 c          # launch detection on device 0
./demo dla34_cnet.rt /dev/video0 c              # launch detection on device 0
116
117
118
119
120
```

## mAP demo
To compute mAP, precision, recall and f1score, run the map_demo.

xavier's avatar
xavier committed
121
122
123
124
125
126
A validation set is needed. To download COCO_val2017 run (form the root folder): 
```
bash download_validation.sh 
```

To compute the map, the following parameters are needed:
127
128
129
130
```
./map_demo <network rt> <network type [y|c]> <labels file path> <config file path>
```
where 
Micaela Verucchi's avatar
Micaela Verucchi committed
131
132
133
134
* ```<network rt>```: rt file of a choosen network on wich compute the mAP.
* ```<network type [y|c]>```: type of network. Right now only y(yolo) and c(centernet) are allowed
* ```<labels file path>```: path to a text file containing all the paths of the groundtruth labels. It is important that all the labels of the groundtruth are in a folder called 'labels'. In the folder containing the folder 'labels' there should be also a folder 'images', containing all the groundtruth images having the same same as the labels. To better understand, if there is a label path/to/labels/000001.txt there should be a corresponding image path/to/images/000001.jpg. 
* ```<config file path>```: path to a yaml file with the parameters needed for the mAP computation, similar to demo/config.yaml
135
136
137
138

Example:

```
xavier's avatar
xavier committed
139
140
cd build
./map_demo dla34_cnet.rt c ../demo/COCO_val2017/all_labels.txt ../demo/config.yaml
xavier's avatar
xavier committed
141
```
Micaela Verucchi's avatar
Micaela Verucchi committed
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157

## Supported networks

| Test Name         | Network                                       | Dataset                                                       | N Classes | Input size    | Weights                                                                   |
| :---------------- | :-------------------------------------------- | :-----------------------------------------------------------: | :-------: | :-----------: | :------------------------------------------------------------------------ |
| yolo              | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 608x608       | weights                                                                   |
| yolo_224          | YOLO v2<sup>1</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| yolo_berkeley     | YOLO v2<sup>1</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 416x736       | weights                                                                   |
| yolo_relu         | YOLO v2 (with ReLU, not Leaky)<sup>1</sup>    | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
| yolo_tiny         | YOLO v2 tiny<sup>1</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | weights                                                                   |
| yolo_voc          | YOLO v2<sup>1</sup>                           | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 416x416       | weights                                                                   |
| yolo3             | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/jPXmHyptpLoNdNR/download)     |
| yolo3_berkeley    | YOLO v3<sup>2</sup>                           | [BDD100K  ](https://bair.berkeley.edu/blog/2018/05/30/bdd/)   | 10        | 320x544       | weights                                                                   |
| yolo3_coco4       | YOLO v3<sup>2</sup>                           | [COCO 2014](http://cocodataset.org/)                          | 4         | 416x416       | weights                                                                   |
| yolo3_flir        | YOLO v3<sup>2</sup>                           | [FREE FLIR](https://www.flir.com/oem/adas/adas-dataset-form/) | 3         | 320x544       | weights                                                                   |
| yolo3_tiny        | YOLO v3 tiny<sup>2</sup>                      | [COCO 2014](http://cocodataset.org/)                          | 80        | 416x416       | [weights](https://cloud.hipert.unimore.it/s/LMcSHtWaLeps8yN/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
158
| yolo3_tiny512     | YOLO v3 tiny<sup>2</sup>                      | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/njnYACnQfWQFKrn/download)     |
Micaela Verucchi's avatar
Micaela Verucchi committed
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
| dla34             | Deep Leayer Aggreagtion (DLA) 34<sup>3</sup>  | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| dla34_cnet        | Centernet (DLA34 backend)<sup>4</sup>         | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/8AjXdgCeRzCa5AF/download)     |
| mobilenetv2ssd    | Mobilnet v2 SSD Lite<sup>5</sup>              | [VOC      ](http://host.robots.ox.ac.uk/pascal/VOC/)          | 21        | 300x300       | [weights](https://cloud.hipert.unimore.it/s/x4ZfxBKN23zAJQp/download)     |
| resnet101         | Resnet 101<sup>6</sup>                        | [COCO 2014](http://cocodataset.org/)                          | 80        | 224x224       | weights                                                                   |
| resnet101_cnet    | Centernet (Resnet101 backend)<sup>4</sup>     | [COCO 2017](http://cocodataset.org/)                          | 80        | 512x512       | [weights](https://cloud.hipert.unimore.it/s/B6mj33k7beECXsY/download)     |



## References

1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Micaela Verucchi's avatar
Micaela Verucchi committed
174
6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.