Posts Tagged ‘computer vision’

Scripting Video Editing with F# and FFmpeg

July 20, 2017 Leave a comment

Computer vision should not be confused with image processing (as we all know). I love building computer vision pipelines, but sometimes menial tasks of pure image processing, automated editing come up.

Suppose you had the same astronauts from one of the previous posts participating in a study, where they are actually filmed watching something, say an episode of Star Wars. You ran your favorite face detection (Dlib-based, of course) on a sample of frames from that video, and found that your viewers don’t move around much. You then applied a clustering algorithm to determine the region for each of the viewers where their faces are most likely going to be during the entire video.

Now, for the next step of this study, you don’t want to keep the entire video, you only want viewers’ faces. So the idea is to split the original video into, in this case 14, individual small videos of just the faces. Also, this doesn’t need to be done on every video frame, but on a fraction of them. Every 3rd, 5th, etc. The graph of want you want to accomplish looks like this:

(Seems like skip & crop should be refactored into one operation, see below why they are not)

It’s simple enough to code something that does what you need (remember, the cut out regions remain constant throughout the video), but wouldn’t it be neat if there already were a powerful component that could take this graph as a parameter and do what’s required very fast?! FFmpeg does just this! FFmpeg is a command line tool, so wouldn’t it be even better if in our case where we need to specify lots of things on the command line, there would be a great scripting language/tool that would make creating these command lines a breeze? There is one, of course, it’s PowerShell. However, F# is a great scripting language as well and I look for any excuse to use it.

Coding it

The actual ffmpeg command line we want should be:

ffmpeg -i input.mp4 -filter_complex \
   "[0:v]framestep=2,setpts=0.5*PTS,crop=110:110:5:5[f0]; \
    ..." \
   -map [f0] -map [f1] ... output.mp4

FFmpeg has a nice command line sublanguage that allows you to build video editing graphs. They are described nicely here as well as in a few other places.

Our graph is split into as many branches as there are faces in the video (see above). For each such branch (they are separated by a “;” and named in “[]” as f0 – f<n-1>, we instruct ffmpeg to take video stream 0 ([0:v]), take every 2nd frame of the stream, decrease the framerate by 1/2 and crop our a region described as (width, height, left, top). We are ignoring the audio since we are only interested in the faces.

One thing that took me a while to figure out was that I needed to repeat what would normally be factored out at every branch: couldn’t just say “framestep, reducerate” once and append that to the custom crop operation, different for every branch. However, it appears that these common operations do execute once in ffmpeg, so the entire process is very fast. Takes about 90 sec per 45 min of H.264 encoded video.

Here is the script:

module FfmpegScriptor =
    open System.Drawing
    open System.IO

    // require ffmpeg be in the path
    let ffmpeg = "ffmpeg.exe"

    let scriptFfmpeg clip (seatmap : (int * Rectangle) []) outpath each debug =
        let pts = 1. / float each
        let ext = Path.GetExtension(clip)
        let subCommandCrop =
            |> (fun (v, r) -> 
                sprintf "[0:v]framestep=%d,setpts=%.2f*PTS,crop=%d:%d:%d:%d[f%d]" each pts r.Width r.Height r.Left r.Top v
            |> Array.reduce(fun a s -> a + ";" + s)

        let subCommandOut = 
            |> (fun (v, _) -> 
                sprintf " -map [f%d] \"%s\"" v (Path.Combine(outpath, string v + ext))
            |> Array.reduce (fun a s -> a + s)

        let command = sprintf "-i \"%s\" -y -filter_complex \"%s\" %s" clip subCommandCrop subCommandOut

        let exitCode = FfmpegWrapper.runFfmpeg ffmpeg command debug

        if exitCode <> 0 then failwith "Failed to run ffmpeg"

No rocket science here, just quickly building the command line. The debug parameter is used if we want to observe the workings of ffmpeg in a separate command window.

And, unlike PowerShell, still need to write a few lines to launch ffmpeg:

module FfmpegWrapper =
    open System
    open System.IO
    open System.Diagnostics

    let runFfmpeg ffmpeg command debug =
        use proc = new Process()
        let pi = ProcessStartInfo ffmpeg
        let dir = Path.GetDirectoryName ffmpeg

        pi.CreateNoWindow <- true
        pi.ErrorDialog <- false
        pi.UseShellExecute <- debug
        pi.Arguments <- command
        pi.WorkingDirectory <- dir
        proc.StartInfo <- pi

        if not (proc.Start()) then 1

Detecting Faces with Dlib from F#. IFSharp Notebook

March 11, 2017 Leave a comment

The Choices

I have experimented with OpenCV and Dlib face detection in my computer vision pipeline. Both work well, but the Dlib one worked better: it is more sensitive with (in my case) almost no false positives right out of the box!

Dlib uses several HOG filters that account for profile as well as frontal views. HOG detectors train quickly and are very effective. OpenCV uses Haar Cascades and doesn’t have the same versatility out of the box: you need separate data files for profiles and frontal views. In the case of OpenCV, you also need to experiment with the parameters quite a bit to get it where you want it to be.

Both libraries allow for custom trained detectors, but in my case it did not come to that: Dlib detector was sufficient.

Using Dlib in F# Code

Dlib is a C++ library, also available for Python. No love for .NET.

The step-by-step of calling dlib face detector from F# code is in the IFSharp Notebook, hosted on my GitHub. Here .NET, EmguCV (OpenCV), and Dlib all work happily together.

(Took Azure Notebooks for a spin, works pretty well).

Getting Emotional with Affectiva, F#, and Emgu

January 5, 2017 1 comment

I’ve been playing with Affectiva emotion, demographics, and face detection SDK, found it excellent, however, their sample gallery lacks a sample in F#! So here we are to correct that.

I just wanted a simple F# script that would let me take all kinds of the SDK options for a ride. The script itself is 130 lines. Out of that about 30 lines is just a boilerplate to load all the relevant libraries, setup the environment, etc.

Finally, here I am goofing off in front of my webcam.


Not much in terms of setup. So, yes, regular things for downloading/installing EmguCV, OpenCV, and installing Affectiva SDK.

Then all this needs to be reflected in the script:

open System

Environment.CurrentDirectory <- @"C:\Program Files\Affectiva\Affdex SDK\bin\release"
#r "../packages/EmguCV."
#r "../packages/EmguCV."
#r "../packages/EmguCV."
#r "../packages/OpenTK.1.1.2225.0/lib/net20/OpenTK.dll"
#r "System.Drawing.dll"
#r "System.Windows.Forms.dll"
#r @"C:\Program Files\Affectiva\Affdex SDK\bin\release\Affdex.dll"

open Affdex
open Emgu.CV
open Emgu.CV.CvEnum
open System.IO
open System.Collections.Generic
open Emgu.CV.UI
open Emgu.CV.Structure
open System.Drawing
open System.Linq
open System.Threading
open System.Diagnostics

let classifierPath = @"C:\Program Files\Affectiva\Affdex SDK\data"
let resources = Path.Combine(__SOURCE_DIRECTORY__, "Resources")

Just loading libraries, no big deal. Except we need to make sure Affdex.dll finds its dependencies, hence setting the current path at the beginning.

Initializing the Detector

let detector = new CameraDetector()




    while not finished do

Here setDetectGlasses is my favorite. Check it out in the video.

I’m using CameraDetector to capture video from the webcam, if I needed to capture a file video I’d use VideoDetector. Setting properties is easy, albeit slightly confusing at first – all these subtle differences between valence and attention… It makes sense when you get used to it. My favorite is setDetectAllEmojis. The SDK comes with quite a few emojis that can be used to reflect what’s going on in the video.

The VideoDetector is set up in a similar way, except you also need to issue detector.``process``() to start running, camera detector does it automatically.

I would also like to use use instead of let to instantiate the disposable detector, but cannot do it in the script, so true to an instinct for plugging memory leaks before they spring, I wrapped it in the try..finally – not at all necessary in a script, and I don’t do it for EmguCV elements anyway. This is not a production code practice.

Fun Part: Processing Results

As processed frames start coming in, we hook up to the detector image listener (detector.setImageListener()) which will feed us images and all kinds of fun stats as they come in. Also, setProcessStatusListener will tell us when things are done or errors occur.

let imageListener = {
new ImageListener with
    member this.onImageCapture (frame : Affdex.Frame) = ()

    member this.onImageResults(faces : Dictionary<int, Face>, frame : Affdex.Frame) =
        let img = new Image<Rgb, byte>(frame.getWidth(), frame.getHeight());
        img.Bytes <- frame.getBGRByteArray()

        let faces = faces |> (fun kvp -> kvp.Key, kvp.Value) |> Seq.toArray

        // draw tracking points
        faces.ToList().ForEach(fun (idx, face) ->
            let points = face.FeaturePoints |> featurePointToPoint
            let tl, br = Point(points.Min(fun p -> p.X), points.Min(fun p -> p.Y)), Point(points.Max(fun p -> p.X), points.Max(fun p -> p.Y))

            let rect = Rectangle(tl, Size(Point(br.X - tl.X, br.Y - tl.Y)))
            CvInvoke.Rectangle(img, rect, Bgr(Color.Green).MCvScalar, 2)

            // tracking points
            points.AsParallel().ForAll(fun p ->
                CvInvoke.Circle(img, p, 2, Bgr(Color.Red).MCvScalar, -1)

            // age
            let age = string face.Appearance.Age
            CvInvoke.PutText(img, age, Point(rect.Right + 5, rect.Top), FontFace.HersheyComplex, 0.5, Bgr(Color.BlueViolet).MCvScalar, 1)

            // gender & appearance
            let gender = int face.Appearance.Gender

            // glasses
            let glasses = int face.Appearance.Glasses

            let appearanceFile = makeFileName gender glasses
            loadIntoImage img appearanceFile (Point(rect.Right + 5, rect.Top + 15)) iconSize

            // emoji
            if face.Emojis.dominantEmoji <> Affdex.Emoji.Unknown then
                let emofile = Path.ChangeExtension(Path.Combine(resources, (int >> string) face.Emojis.dominantEmoji), ".png")
                loadIntoImage img emofile (Point(rect.Left, rect.Top - 50)) iconSize

        viewer.Image <- img.Mat

let processStatusListener = {
new ProcessStatusListener with
    member this.onProcessingException ex = ()
    member this.onProcessingFinished () = finished <- true

Nothing all that tricky about this code. F# object expression comes in handy for quickly creating an object that implements an interface. onImageResults is the key function here. It processes everything and sends it to the EmguCV handy viewer, which is launched at the start of script execution and runs asynchronously (I like how it doesn’t force me to modify its UI elements on the same thread that created it. This is totally cheating and smells buggy, but it’s so convenient for scripting!)

// Create our simplistic UI
let viewer = new ImageViewer()
let sd =
    async {
        return (viewer.ShowDialog()) |> ignore

In the first couple of lines we transform the captured frame to EmguCV-understandable format. I am using Image rather than the recommended Mat class, because I want to splat emojis over the existing frames and as amazing as it is, the only way to do it that I know of in EmguCV is this counter-intuitive use of ROI. If anyone knows a better way of copying one image on top of another (should be easy, right?) please let me know.

The next few lines draw the statistics on the image: tracking points, emojis, and demographic data. Emojis are stored in files located in the resources path (see above, in my case I just copied them locally) with file names matching the SDK emoji codes. A simple function transforms these codes into file names. Finally, the modified frame is sent to the EmguCV viewer. That’s it!

let featurePointToPoint (fp : FeaturePoint) = Point(int fp.X, int fp.Y)
let mutable finished = false
let makeFileName i j = Path.ChangeExtension(Path.Combine(resources, String.Format("{0}{1}", i, j)), ".png")

Image Copy

The following two functions do the magic of copying emojis on top of the image:

let copyImage (src : Image<Bgr, byte>) (dest : Image<Rgb, byte>) (topLeft : Point) =
    let prevRoi = dest.ROI
    dest.ROI <- Rectangle(topLeft, src.Size)
    CvInvoke.cvCopy(src.Ptr, dest.Ptr, IntPtr.Zero)
    dest.ROI <- prevRoi

let loadIntoImage (img : Image<Rgb, byte>) (file : string) (topLeft : Point) (size : Size) =
        let src = new Image<Bgr, byte>(size)
        CvInvoke.Resize(new Image<Bgr, byte>(file), src, size)
        copyImage src img topLeft

copyImage first sets the ROI of the destination, then issues a legacy cvCopy call. It operates on pointer structures which is so ugly! There really should be a better way.