No training implementation is complete until it allows training on a cluster where each machine has multiple GPUs. Multi-node/Multi-GPU Training with PyTorch Lightning SageMaker does a great job enabling this in Script Mode, and all we have to do is write code that supports SageMaker SMDDP implementation of the distributed training DDP protocol. PyTorch Lighting … Continue reading Amazon SageMaker: Distributed Training
Author: fierval
Amazon SageMaker: What Tutorials Don’t Teach
At Fetch we reward you for taking pictures of store and restaurant receipts. Our app needs to read and understand crumpled, dark, smudged, warped, skewed, creased, you get the "picture" images, taken in cars, in your lap, on the way out, while walking the dog, taking out the trash, doing your nails, etc.., etc. Not … Continue reading Amazon SageMaker: What Tutorials Don’t Teach
A Deep Reinforcement Learning Journey Home.
https://youtu.be/OuE2bfxuSUo On a warm afternoon of October 23, 1975, one of the last days of Indian summer that year, Yozhik (aka Ejik, The Little Hedgehog) set out to visit his friend Medvezhonok (The Bear Cub). He was joining him, as he did every evening, for a night of stargazing. Only he never made a return … Continue reading A Deep Reinforcement Learning Journey Home.
Supercharging Object Detection in Video: from Glacial to Lightning Speed
http://www.youtube.com/watch?v=rR6EIakYSZ0 In the following series I will explore different tools and techniques for doing object detection in streaming video in real time or faster. Starting with the baseline Python detector running slowly and gradually picking up speed. In these series In the course of these posts we will explore optimizing object detection in videos. We … Continue reading Supercharging Object Detection in Video: from Glacial to Lightning Speed
Supercharging Object Detection in Videos: Setup
We started from the Python object detector performance as baseline (~ 19 fps). Next we ditch Python and all our pre-installed libraries and custom build everything. C++ will become the development environment not just because it's more "bare bones" than Python and thus more performant but also to access functionality not available in Python. Environment … Continue reading Supercharging Object Detection in Videos: Setup
Supercharging Object Detection in Video: First App
Tensorflow C++ Video Detector It is time to validate all this arduous setup work, run our first C++ detector and reap the first benefits. You may clone this repository, which is a fork of this repository, modified and adapted to the modern times. Ensuring the Right Build Paths Note the following excerpt from CMakeLists.txt: The … Continue reading Supercharging Object Detection in Video: First App
Supercharging Object Detection in Video: Optimizing Decoding and Graph Feeding
In the previous post we validated our install and ran a simple detector in C++. It is now time to start optimizing it. Source code for the finished project is here. Optimizing Video Decoding If we build and run the video_reader.cpp OpenCV sample, we will observe a staggering performance improvement available in OpenCV for decoding … Continue reading Supercharging Object Detection in Video: Optimizing Decoding and Graph Feeding
Supercharging Object Detection in Video: TensorRT 5
Source code for the finished project is here. NVIDIA TensorRT is a framework used to optimize deep networks for inference by performing surgery on graphs trained with popular deep learning frameworks: Tensorflow, Caffe, etc. Preparing the Tensorflow Graph Our code is based on the Uff SSD sample installed with TensorRT 5.0. The guide together with … Continue reading Supercharging Object Detection in Video: TensorRT 5
HoloLens Object Detection
We will explore running object detection on-device with HoloLens, using Unity game engine as our development platform. AR Academy is a great introduction to all aspects of HoloLens development. It has 300-level tutorials that demonstrate how to connect the device to Azure Cognitive Services to perform machine learning tasks. There are no samples of performing … Continue reading HoloLens Object Detection
On the Margins: Non-maximum Suppression with Tensorflow
I'm writing a series of posts on supercharging object detection inference performance in video streams using Tensorflow and cool tech from NVIDIA: step-by-step, starting from 6 fps all the way up to 230. But before I start, this small post is about a cool little gem, which I think is often overlooked. Anyone in the … Continue reading On the Margins: Non-maximum Suppression with Tensorflow