I’m writing a series of posts on supercharging object detection inference performance in video streams using Tensorflow and cool tech from NVIDIA: step-by-step, starting from 6 fps all the way up to 230. But before I start, this small post is about a cool little gem, which I think is often overlooked.
Anyone in the object detection business knows that non-maximum suppression is indispensable to it. But try finding an implementation in a machine learning library of your choice! I couldn’t. Naturally there are lots of implementations on GitHub, so not a big deal, but still I would think that if I am using a machine learning framework something so crucial should just fall out of the box.
Turns out Tensorflow does this! Thank you, Google!
So here is my five-liner to wrap it:
def non_max_suppression_with_tf(sess, boxes, scores, max_output_size, iou_threshold=0.5): ''' Provide a tensorflow session and get non-maximum suppression max_output_size, iou_threshold are passed to tf.image.non_max_suppression ''' non_max_idxs = tf.image.non_max_suppression(boxes, scores, max_output_size, iou_threshold=iou_threshold) new_boxes = tf.cast(tf.gather(boxes, non_max_idxs), tf.int32) new_scores = tf.gather(scores, non_max_idxs) return sess.run([new_boxes, new_scores])
Note, that boxes
and scores
can be numpy
arrays. This does require an active Tensorflow session for the results to materialize.