README.md 2 KB
Newer Older
Francesco Gatti's avatar
README  
Francesco Gatti committed
1
# tkDNN
Francesco Gatti's avatar
Francesco Gatti committed
2
tkDNN is a Deep Neural Network library built with cuDNN primitives specifically thought to work on NVIDIA TK1(and all successive) board.<br>
Francesco Gatti's avatar
README  
Francesco Gatti committed
3
4
The main scope is to do high performance inference on already trained models.

Francesco Gatti's avatar
Francesco Gatti committed
5
this branch actually work on every NVIDIA GPU that support the dependencies:
6
7
8
9
* CUDA 10.0
* CUDNN 7.603
* TENSORRT 6.01
* OPENCV 4.1
Francesco Gatti's avatar
README  
Francesco Gatti committed
10
11
12
13
14
15
16
17
18
19
20
21
22
23

## Workflow
The recommended workflow follow these step:
* Build and train a model in Keras (on any PC)
* Export weights and bias 
* Define the model on tkDNN
* Do inference (on TK1)

## Compile the library
Build with cmake
```
mkdir build
cd build
cmake ..
Francesco Gatti's avatar
Francesco Gatti committed
24
# use -DTEST_DATA=False to skip dataset download
Francesco Gatti's avatar
README  
Francesco Gatti committed
25
26
make
```
Francesco Gatti's avatar
Francesco Gatti committed
27
28
during the cmake configuration it will be dowloaded the weights needed for running
the tests
Francesco Gatti's avatar
README  
Francesco Gatti committed
29
30

## Test
Francesco Gatti's avatar
Francesco Gatti committed
31
32
33
34
35
36
Assumiung you have correctly builded the library these are the test ready to exec:
* test_simple: a simple convolutional and dense network (CUDNN only)
* test_mnist: the famous mnist netwok (CUDNN and TENSORRT)
* test_mnistRT: the mnist network hardcoded in using tensorRT apis (TENSORRT only)
* test_yolo: YOLO detection network (CUDNN and TENSORRT)
* test_yolo_tiny: smaller version of YOLO (CUDNN and TENSRRT)
Francesco Gatti's avatar
Francesco Gatti committed
37
* test_yolo3_berkeley: our yolo3 version trained with BDD100K dateset 
Francesco Gatti's avatar
README  
Francesco Gatti committed
38

Francesco Gatti's avatar
Francesco Gatti committed
39
## yolo3 berkeley demo detection
Francesco Gatti's avatar
Francesco Gatti committed
40
41
42
For the live detection you need to precompile the tensorRT file by luncing the desidered network test, this is the recommended process:
```
export TKDNN_MODE=FP16   # set the half floating point optimization
Francesco Gatti's avatar
Francesco Gatti committed
43
44
rm yolo3_berkeley.rt		 # be sure to delete(or move) old tensorRT files
./test_yolo3_berkeley              # run the yolo test (is slow)
Francesco Gatti's avatar
Francesco Gatti committed
45
46
# with f16 inference the result will be a bit incorrect
```
Francesco Gatti's avatar
Francesco Gatti committed
47
this will genereate a yolo3_berkeley.rt file that can be used for live detection:
Francesco Gatti's avatar
Francesco Gatti committed
48
```
Francesco Gatti's avatar
doc    
Francesco Gatti committed
49
50
./yolo3_demo                               # launch detection on a demo video
./yolo3_demo yolo3_berkeley.rt /dev/video0 # launch detection on device 0
Francesco Gatti's avatar
Francesco Gatti committed
51
```
Francesco Gatti's avatar
Francesco Gatti committed
52
![demo](https://user-images.githubusercontent.com/11562617/72547657-540e7800-388d-11ea-83c6-49dfea2a0607.gif)