We started from the Python object detector performance as baseline (~ 19 fps).
Next we ditch Python and all our pre-installed libraries and custom build everything. C++ will become the development environment not just because it’s more “bare bones” than Python and thus more performant but also to access functionality not available in Python.
- NVIDIA CUDA supporting GPU
- Ubuntu 16.04 LTS
- NVIDIA Driver v410
- CUDA 10 (or 9 for those feeling less adventurous)
- TensorRT 5.0.2
- Anaconda Python latest release
- CMake 3.8+ (for CUDA kernel compilation)
- Tensorflow r1.12+ (with Bazel 0.19.2 to build it)
- OpenCV 3.3+
- Inception SSD V2 Object Detector frozen Tensorflow graph
I assume that all libraries and build tools (gcc v5, etc) necessary are already installed or will be while installing the above toolkits.
Install v396 for CUDA 9 or v410 for CUDA 10.
IMPORTANT: After installation create symlinks to codec libraries:
sudo ln -s /usr/lib/nvidia-396/libnvcuvid.so /usr/lib/libnvcuvid.so sudo ln -s /usr/lib/nvidia-396/libnvcuvid.so.1 /usr/lib/libnvcuvid.so.1
NOTE: If installing CUDA 10, you need to copy all the files called
/usr/local/cuda/include in CUDA 9 toolkit into the same directory of CUDA 10. NVIDIA has removed these from the toolkit as the codec is being separated into its own SDK.
Install the latest Python 3 Anaconda distribution
Once installed, create a new Python 3.6 environment.
conda create -n py36 python=3.6 anaconda source activate py36
This will activate the newly installed Python 3.6
Follow instructions to build tensorflow from source. Skip to “Install Bazel” section. Install Bazel 0.19.2. Do it all from the Anaconda prompt above with the py36 environment active. Checkout
r1.12 (or anything later than r1.10).
You may need to download dependencies (Eigen, Protobuf) into the build tree by running
tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile/download_dependencies.sh. This is not always a good idea: I had the latest (3.3.7) version of Eigen downloaded this way break during compilation. No big deal, the components download into the
tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile/downloads directory and can be deleted from there if you already have a suitable version of Eigen (3.3.6) or protobuf installed. (Eigen has an “xcopy” installation, you will just need to copy
unsupported directories from the distribution to
Point at directories of your
py36 environment created above.
Make sure you answer “yes” to CUDA support, answer “Yes” or “No” to TensorRT support, it’s not going to matter for this excercise. Select the appropriate architecture for your GPU (7.0 for Volta). Makes things run much faster when executing inference code.
./configure fixing your bazel configuration may be required if the build does not start. Locate
.bazelrc in your
<tensorflow source git path>/tensorflowAdd the following line at the top of it:
import <tensorflow source git path>/tensorflow/tools/bazel.rc
When building pip package or any tensorflow related target you do not need to specify
--config=cuda. So skip to “Build pip package” and follow instructions under CPU-only. It will do the right thing and build with CUDA support.
Validate that everything works. In a new terminal, from a location other than where tensorflow code resides:
source activate py36 python import tensorflow as tf a = tf.constant([1, 2, 3]) b = tf.constant([4, 5, 6]) c = a + b sess = tf.Session() sess.run(c) sess.close() exit()
If this works, build the C++ library
bazel build //tensorflow:libtensorflow_cc.so
CMake package installed on Ubuntu 16.04 by default is v3.5.
sudo apt remove it if installed. We are going to need v3.8+, so download the latest from CMake site and install it. After installation the easiest thing is to create symlinks to all the new CMake executables in
Follow instructions to install. You may probably skip the “[compiler]” sub-section of the Required Packages section. If you want to install OpenGL GTK hooks that may be useful later, install OpenGL libraries first and then:
sudo apt-get install libgtkglext1 libgtkglext1-dev
Checkout 3.3.1 (or later) in both
Build OpenCV using
<opencv_contrib code root>/modules
CMAKE_BUILD_TYPEset to Debug if you want to step into OpenCV code later. Or leave blank.
- Hit Configure
- Hit Generate
make -j7 #runs 7 jobs in parallel sudo make install sudo ldconfig
Stage for Build
Finally, copy tensorflow include files and libraries we built above to the location where our future builds will pick them up.
cd <tensorflow code git root> sudo mkdir /usr/local/tensorflow sudo mkdir /usr/local/tensorflow/include sudo cp -r tensorflow/contrib/makefile/downloads/eigen/Eigen /usr/local/tensorflow/include/ sudo cp -r tensorflow/contrib/makefile/downloads/eigen/unsupported /usr/local/tensorflow/include/ sudo cp tensorflow/contrib/makefile/downloads/nsync/public/* /usr/local/tensorflow/include/ sudo cp -r bazel-genfiles/tensorflow /usr/local/tensorflow/include/ sudo cp -r tensorflow/cc /usr/local/tensorflow/include/tensorflow sudo cp -r tensorflow/core /usr/local/tensorflow/include/tensorflow sudo mkdir /usr/local/tensorflow/include/third_party sudo cp -r third_party/eigen3 /usr/local/tensorflow/include/third_party/ sudo mkdir /usr/local/tensorflow/lib sudo cp bazel-bin/tensorflow/libtensorflow_*.so /usr/local/tensorflow/lib
All done! We will validate the installation in the next post