# PyTorch_ONNX_TensorRT **Repository Path**: zhangming8/PyTorch_ONNX_TensorRT ## Basic Information - **Project Name**: PyTorch_ONNX_TensorRT - **Description**: https://github.com/RizhaoCai/PyTorch_ONNX_TensorRT - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-11-24 - **Last Updated**: 2021-11-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PyTorch_ONNX_TensorRT A tutorial that show how could you build a TensorRT engine from a PyTorch Model with the help of ONNX. Please kindly star this project if you feel it helpful. # News A dynamic_shape_example (batch size dimension) is added. Just run `python3 dynamic_shape_example.py` This example should be run on TensorRT 7.x. I find that this repo is a bit out-of-date since there are some API changes from TensorRT 5.0 to TensorRT 7.x. I will put sometime in a near future to make it compatible. # Environment 0. Ubuntu 16.04 x86_64, CUDA 10.0 1. Python 3.5 2. [PyTorch](https://pytorch.org/get-started/locally/) 1.0 3. TensorRT 5.0 (If you are using Jetson TX2, TensorRT will be already there if you have installed the jetpack) 3.1 Download [TensorRT](https://developer.nvidia.com/tensorrt) (You should pick up the right package that matches your environment) 3.2 Debian installation ``` $ sudo dpkg -i nv-tensorrt-repo-ubuntu1x04-cudax.x-trt5.x.x.x-ga-yyyymmdd_1-1_amd64.deb # The downloaeded file $ sudo apt-key add /var/nv-tensorrt-repo-cudax.x-trt5.x.x.x-gayyyymmdd/7fa2af80.pub $ sudo apt-get update $ sudo apt-get install tensorrt $ sudo apt-get install python3-libnvinfer ``` To verify the installation of TensorRT `$ dpkg -l | grep TensorRT` You should see similar things like ``` ii graphsurgeon-tf 5.1.5-1+cuda10.1 amd64 GraphSurgeon for TensorRT package ii libnvinfer-dev 5.1.5-1+cuda10.1 amd64 TensorRT development libraries and headers ii libnvinfer-samples 5.1.5-1+cuda10.1 amd64 TensorRT samples and documentation ii libnvinfer5 5.1.5-1+cuda10.1 amd64 TensorRT runtime libraries ii python-libnvinfer 5.1.5-1+cuda10.1 amd64 Python bindings for TensorRT ii python-libnvinfer-dev 5.1.5-1+cuda10.1 amd64 Python development package for TensorRT ii python3-libnvinfer 5.1.5-1+cuda10.1 amd64 Python 3 bindings for TensorRT ii python3-libnvinfer-dev 5.1.5-1+cuda10.1 amd64 Python 3 development package for TensorRT ii tensorrt 5.1.5.x-1+cuda10.1 amd64 Meta package of TensorRT ii uff-converter-tf 5.1.5-1+cuda10.1 amd64 UFF converter for TensorRT package ``` 3.2 Install PyCuda (This will support TensorRT) ``` $ pip3 install pycuda ``` If you get problems with pip, please try ``` $ sudo apt-get install python3-pycuda #(Install for /usr/bin/python3) ``` For full details, please check the [TensorRT-Installtation Guide](https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html) # Usage Please check the file 'pytorch_onnx_trt.ipynb' ## Int 8: To run the int-8 optimization ```python python3 trt_int8_demo.py ``` You will see output like >Function forward_onnx called! >graph(%input : Float(32, 3, 128, 128), %1 : Float(16, 3, 3, 3), %2 : Float(16), %3 : Float(64, 16, 5, 5), %4 : Float(64), %5 : Float(10, 64), %6 : Float(10)): %7 : Float(32, 16, 126, 126) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[1, 1]](%input, %1, %2), scope: Conv2d %8 : Float(32, 16, 126, 126) = onnx::Relu(%7), scope: ReLU %9 : Float(32, 16, 124, 124) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[1, 1]](%8), scope: MaxPool2d %10 : Float(32, 64, 120, 120) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[0, 0, 0, 0], strides=[1, 1]](%9, %3, %4), scope: Conv2d %11 : Float(32, 64, 120, 120) = onnx::Relu(%10), scope: ReLU %12 : Float(32, 64, 1, 1) = onnx::GlobalAveragePool(%11), scope: AdaptiveAvgPool2d %13 : Float(32, 64) = onnx::Flatten[axis=1](%12) %output : Float(32, 10) = onnx::Gemm[alpha=1, beta=1, transB=1](%13, %5, %6), scope: Linear return (%output) Int8 mode enabled Loading ONNX file from path model_128.onnx... Beginning ONNX file parsing Completed parsing of ONNX file Building an engine from file model_128.onnx; this may take a while... Completed creating the engine Loading ONNX file from path model_128.onnx... Beginning ONNX file parsing Completed parsing of ONNX file Building an engine from file model_128.onnx; this may take a while... Completed creating the engine Loading ONNX file from path model_128.onnx... Beginning ONNX file parsing Completed parsing of ONNX file Building an engine from file model_128.onnx; this may take a while... Completed creating the engine Toal time used by engine_int8: 0.0009500550794171857 Toal time used by engine_fp16: 0.001466430104649938 Toal time used by engine: 0.002231682623709525 This output is run by Jetson Xavier. Please be noted that int8 mode is only supported by specific GPU modules, e.g. Jetson Xavier , Tesla P4, etc. TensorRT 7 have been released. According to some feedbacks, the code is tested well with TensorRT 5.0 and might have some problems with TensorRT 7.0. I will update this repo by doing a test with TensorRT 7 and making it compatible soon. # Contact Cai, Rizhao Email: rizhao.cai@gmail.com