This blog chronicles my experiences learning F# via implementing Push: a programming language designed for use in evolving populations of programs geared for solving symbolic regression problems (genetic programming). It has since evolved to contain bits and pieces reflecting my thorny twisted software development path.

F# Humor (kinda)

October 15, 2017 1 comment

So I was bored, didn’t know what to do, started solving problems on codewars (shut up!).

There was this little problem there, where they are asking you to compute \frac{\sum_{i=1}^n i!}{n!}, for potentially large n’s with 10^{-6} precision.

It’s easy to see that this needs to be refactored (no need to actually compute factorials), and the solution I found most “fsharpy” and beautiful (and not mine) was:

let going n =
    { n .. -1 .. 2 }
    |> double
    |> Seq.scan (/) 1.
    |> Seq.sum

This works. Here is my less elegant solution:

let going n =
    let rec sumBack (acc : double) n (cur : double) =
        if n = 0 || cur < 1e-6 then Math.Round(acc, 6)
            sumBack (acc + cur) (n - 1) (cur / double n)
    sumBack  0. n 1.

Kinda ugh, but what I found humorous and instructive (things humorous usually are) was that falling for the language beauty and power is not always conducive to performance. Since we have the 10^{-6} constraint, it’s not necessary to scan the entire sequence {n..2}, we can start backwards and terminate as soon as the constraint is met. Thus, for very large n, we’ll only need to perform 1 division. So as the sum converges to 1, performance of the second solution converges to O(1), and, since it’s tail recursion, we aren’t actually using any stack, our memory usage is always O(1), which is more than we can say for the first solution. But I still love it more than my own for purely aesthetical reasons.

Categories: F#

Scripting Video Editing with F# and FFmpeg

July 20, 2017 Leave a comment

Computer vision should not be confused with image processing (as we all know). I love building computer vision pipelines, but sometimes menial tasks of pure image processing, automated editing come up.

Suppose you had the same astronauts from one of the previous posts participating in a study, where they are actually filmed watching something, say an episode of Star Wars. You ran your favorite face detection (Dlib-based, of course) on a sample of frames from that video, and found that your viewers don’t move around much. You then applied a clustering algorithm to determine the region for each of the viewers where their faces are most likely going to be during the entire video.

Now, for the next step of this study, you don’t want to keep the entire video, you only want viewers’ faces. So the idea is to split the original video into, in this case 14, individual small videos of just the faces. Also, this doesn’t need to be done on every video frame, but on a fraction of them. Every 3rd, 5th, etc. The graph of want you want to accomplish looks like this:

(Seems like skip & crop should be refactored into one operation, see below why they are not)

It’s simple enough to code something that does what you need (remember, the cut out regions remain constant throughout the video), but wouldn’t it be neat if there already were a powerful component that could take this graph as a parameter and do what’s required very fast?! FFmpeg does just this! FFmpeg is a command line tool, so wouldn’t it be even better if in our case where we need to specify lots of things on the command line, there would be a great scripting language/tool that would make creating these command lines a breeze? There is one, of course, it’s PowerShell. However, F# is a great scripting language as well and I look for any excuse to use it.

Coding it

The actual ffmpeg command line we want should be:

ffmpeg -i input.mp4 -filter_complex \
   "[0:v]framestep=2,setpts=0.5*PTS,crop=110:110:5:5[f0]; \
    ..." \
   -map [f0] -map [f1] ... output.mp4

FFmpeg has a nice command line sublanguage that allows you to build video editing graphs. They are described nicely here as well as in a few other places.

Our graph is split into as many branches as there are faces in the video (see above). For each such branch (they are separated by a “;” and named in “[]” as f0 – f<n-1>, we instruct ffmpeg to take video stream 0 ([0:v]), take every 2nd frame of the stream, decrease the framerate by 1/2 and crop our a region described as (width, height, left, top). We are ignoring the audio since we are only interested in the faces.

One thing that took me a while to figure out was that I needed to repeat what would normally be factored out at every branch: couldn’t just say “framestep, reducerate” once and append that to the custom crop operation, different for every branch. However, it appears that these common operations do execute once in ffmpeg, so the entire process is very fast. Takes about 90 sec per 45 min of H.264 encoded video.

Here is the script:

module FfmpegScriptor =
    open System.Drawing
    open System.IO

    // require ffmpeg be in the path
    let ffmpeg = "ffmpeg.exe"

    let scriptFfmpeg clip (seatmap : (int * Rectangle) []) outpath each debug =
        let pts = 1. / float each
        let ext = Path.GetExtension(clip)
        let subCommandCrop =
            |> (fun (v, r) -> 
                sprintf "[0:v]framestep=%d,setpts=%.2f*PTS,crop=%d:%d:%d:%d[f%d]" each pts r.Width r.Height r.Left r.Top v
            |> Array.reduce(fun a s -> a + ";" + s)

        let subCommandOut = 
            |> (fun (v, _) -> 
                sprintf " -map [f%d] \"%s\"" v (Path.Combine(outpath, string v + ext))
            |> Array.reduce (fun a s -> a + s)

        let command = sprintf "-i \"%s\" -y -filter_complex \"%s\" %s" clip subCommandCrop subCommandOut

        let exitCode = FfmpegWrapper.runFfmpeg ffmpeg command debug

        if exitCode <> 0 then failwith "Failed to run ffmpeg"

No rocket science here, just quickly building the command line. The debug parameter is used if we want to observe the workings of ffmpeg in a separate command window.

And, unlike PowerShell, still need to write a few lines to launch ffmpeg:

module FfmpegWrapper =
    open System
    open System.IO
    open System.Diagnostics

    let runFfmpeg ffmpeg command debug =
        use proc = new Process()
        let pi = ProcessStartInfo ffmpeg
        let dir = Path.GetDirectoryName ffmpeg

        pi.CreateNoWindow <- true
        pi.ErrorDialog <- false
        pi.UseShellExecute <- debug
        pi.Arguments <- command
        pi.WorkingDirectory <- dir
        proc.StartInfo <- pi

        if not (proc.Start()) then 1

D3 Off-Label

May 22, 2017 Leave a comment

Auto-writing Software

Good software writes itself. We all know the patterns and following them makes things easy. Not because we have memorized any monumental volume on design patterns, but because years of experience taught us what should be done on an almost intuitive level.

Also in a negative sense. We avoid duplication, globals, encapsulation violations. We know that planting a global variable somewhere to “facilitate” object interaction as innocent as it appears at the moment, throws a huge wrench into extensibility and new feature development. (This is known by a majestic name: legacy). Even if in a moment of weakness we allowed ourselves to relax unnecessarily the software itself would soon make us realize the error of our ways by resisting any attempt to extend it. And then with a sigh we go back regretting a momentary weakness and do things right.

In this case, however, software design dictated the user experience.

The Context

Last year, I did a brief stint on the DNA Storage Project at Microsoft Research. My task was to create an app that would streamline the process of encoding, synthesizing then decoding and analyzing the data on DNA molecules.

The UX Failure

Our intrepid UX designers sketched a pretty simple UX. The workflow appeared uncomplicated enough with just a few interacting entities. It was natural to split it into the “synthesis” and “analysis” parts, and then a few hierarchical grids or grids with links should have done the job. The MEAN stack fit the need and Azure was to host all the storage and backend processing.

Everything worked great until it came to the UX. No matter how much I huffed and puffed it would not yield. Whatever I did, I could not build the MVVM workflow. The model on the “synthesis” and the “analysis” side consisted of pretty much the same entities (files, pools, runs), however, their semantics while similar in some aspects were drastically different in others. While it seemed natural to have a model with a few abstract base classes wrapping the commonalities, the View-Model became an insurmountable obstacle: different View interactions demanded a lot of “similar but different” code in the VM; duplication was everywhere. The code was coming out entwined, and unmaintainable.

The inspiration came from the House episode Failure to Communicate. House is in Baltimore justifying some of his cases to a Medicare review officer. One of the cases involved prescribing Viagra to a woman with heart problems because she did not tolerate Nitroglycerin. Medicare was refusing to cover the off-label use.

Using D3 for UX development is an off-label use of D3. Perhaps I could have picked a different framework, but this was the one I was familiar with. The idea was to throw away MVVM and turn the entire experience in a data-driven “video-game” as one of the UX designers on the project called it. Then all the conceptual difficulties simply dissolved. With D3, I could easily streamline the peculiarities of the workflow, and, while new challenges emerged (entities interactions, drag & drop, etc), the software resumed its course “writing itself”.

Here is what this proof-of-concept prototype looked like.

Categories: d3, d3js Tags: , , ,

Detecting Faces with Dlib from F#. IFSharp Notebook

March 11, 2017 Leave a comment

The Choices

I have experimented with OpenCV and Dlib face detection in my computer vision pipeline. Both work well, but the Dlib one worked better: it is more sensitive with (in my case) almost no false positives right out of the box!

Dlib uses several HOG filters that account for profile as well as frontal views. HOG detectors train quickly and are very effective. OpenCV uses Haar Cascades and doesn’t have the same versatility out of the box: you need separate data files for profiles and frontal views. In the case of OpenCV, you also need to experiment with the parameters quite a bit to get it where you want it to be.

Both libraries allow for custom trained detectors, but in my case it did not come to that: Dlib detector was sufficient.

Using Dlib in F# Code

Dlib is a C++ library, also available for Python. No love for .NET.

The step-by-step of calling dlib face detector from F# code is in the IFSharp Notebook, hosted on my GitHub. Here .NET, EmguCV (OpenCV), and Dlib all work happily together.

(Took Azure Notebooks for a spin, works pretty well).

HashSet, Graph, Cognac

February 13, 2017 Leave a comment

The post on applying GPU to finding Eulerian path mentioned a stumbling block: partitioning a very specific kind of graph.

In general, partitioning is a hard problem, NP-hard actually. The graphs we are dealing with are very specific, though, and may be partitioned in O(|E|) time (E – the set of graph edges). Our graphs are of the following form: partitions If we are “lucky” it consists of just one or very few (directed) loops, if not – we have lots of such disconnected loops. And it is this unlucky case that kept the bottle of good cognac sealed all these months (only a descent solution to the partitioning problem would break it out).

To partition such a graph, i.e., color every loop in its distinct color, all we need to do is walk the graph from any vertex and once the loop is detected, we pick a vertex at random from the set of those we haven’t visited yet and start walking again. We repeat until everything is visited:

let partitionLinear (end' : int [])=
    let allVertices = HashSet<int>(end')
    let colors = Array.create end'.Length -1
    let mutable color = 0
    while allVertices.Count > 0 do
        let mutable v = allVertices.First()
        while colors.[v] < 0 do
            allVertices.Remove v |> ignore
            colors.[v] <- color
            v <- end'.[v]
        color <- color + 1
    colors, color

This is great, except it doesn’t work. The problem is revealed when a graph is very large and very fragmented. This is when the code at line 7 fails us. The problem is, we expect it to work in O(1): how hard can it be to retrieve the “first” element?! Well, since we are dealing with a data structure that does not have an intrinsic notion of order, it may be quite hard. In fact, the complexity here is O(|E|) (actually O(|V|, but for our graphs |V| = |E|), thus the complexity of the code above is O(|E|^2).

The following code actually works:

let partitionLinear (end' : int [])=
    let colors = Array.create end'.Length -1
    let mutable color = 0
    let mutable num = end'.Length
    let mutable curIdx = 0
    let mutable nextIdx = num - 1

    while num > 0 && curIdx >= 0 do
        let mutable v = curIdx
        while colors.[v] < 0 do
            colors.[v] <- color
            v <- end'.[v]
        color <- color + 1
        num <- num - 1
        while nextIdx >= 0 && colors.[nextIdx] >= 0 do
            nextIdx <- nextIdx - 1
        curIdx <- nextIdx
    colors, color

In the worst case it is still O(|E|^2), however, this worst case is unlikely and we expect the performance close to O(|E|) in general.
This won’t cure the performance problems of the GPU algorithm, which still relies on the number of graph partitions to be somewhat reasonable, but it will enable it to run in some respectable time. Perhaps time to break out the cognac after all!

Getting Emotional with Affectiva, F#, and Emgu

January 5, 2017 1 comment

I’ve been playing with Affectiva emotion, demographics, and face detection SDK, found it excellent, however, their sample gallery lacks a sample in F#! So here we are to correct that.

I just wanted a simple F# script that would let me take all kinds of the SDK options for a ride. The script itself is 130 lines. Out of that about 30 lines is just a boilerplate to load all the relevant libraries, setup the environment, etc.

Finally, here I am goofing off in front of my webcam.


Not much in terms of setup. So, yes, regular things for downloading/installing EmguCV, OpenCV, and installing Affectiva SDK.

Then all this needs to be reflected in the script:

open System

Environment.CurrentDirectory <- @"C:\Program Files\Affectiva\Affdex SDK\bin\release"
#r "../packages/EmguCV."
#r "../packages/EmguCV."
#r "../packages/EmguCV."
#r "../packages/OpenTK.1.1.2225.0/lib/net20/OpenTK.dll"
#r "System.Drawing.dll"
#r "System.Windows.Forms.dll"
#r @"C:\Program Files\Affectiva\Affdex SDK\bin\release\Affdex.dll"

open Affdex
open Emgu.CV
open Emgu.CV.CvEnum
open System.IO
open System.Collections.Generic
open Emgu.CV.UI
open Emgu.CV.Structure
open System.Drawing
open System.Linq
open System.Threading
open System.Diagnostics

let classifierPath = @"C:\Program Files\Affectiva\Affdex SDK\data"
let resources = Path.Combine(__SOURCE_DIRECTORY__, "Resources")

Just loading libraries, no big deal. Except we need to make sure Affdex.dll finds its dependencies, hence setting the current path at the beginning.

Initializing the Detector

let detector = new CameraDetector()




    while not finished do

Here setDetectGlasses is my favorite. Check it out in the video.

I’m using CameraDetector to capture video from the webcam, if I needed to capture a file video I’d use VideoDetector. Setting properties is easy, albeit slightly confusing at first – all these subtle differences between valence and attention… It makes sense when you get used to it. My favorite is setDetectAllEmojis. The SDK comes with quite a few emojis that can be used to reflect what’s going on in the video.

The VideoDetector is set up in a similar way, except you also need to issue detector.``process``() to start running, camera detector does it automatically.

I would also like to use use instead of let to instantiate the disposable detector, but cannot do it in the script, so true to an instinct for plugging memory leaks before they spring, I wrapped it in the try..finally – not at all necessary in a script, and I don’t do it for EmguCV elements anyway. This is not a production code practice.

Fun Part: Processing Results

As processed frames start coming in, we hook up to the detector image listener (detector.setImageListener()) which will feed us images and all kinds of fun stats as they come in. Also, setProcessStatusListener will tell us when things are done or errors occur.

let imageListener = {
new ImageListener with
    member this.onImageCapture (frame : Affdex.Frame) = ()

    member this.onImageResults(faces : Dictionary<int, Face>, frame : Affdex.Frame) =
        let img = new Image<Rgb, byte>(frame.getWidth(), frame.getHeight());
        img.Bytes <- frame.getBGRByteArray()

        let faces = faces |> (fun kvp -> kvp.Key, kvp.Value) |> Seq.toArray

        // draw tracking points
        faces.ToList().ForEach(fun (idx, face) ->
            let points = face.FeaturePoints |> featurePointToPoint
            let tl, br = Point(points.Min(fun p -> p.X), points.Min(fun p -> p.Y)), Point(points.Max(fun p -> p.X), points.Max(fun p -> p.Y))

            let rect = Rectangle(tl, Size(Point(br.X - tl.X, br.Y - tl.Y)))
            CvInvoke.Rectangle(img, rect, Bgr(Color.Green).MCvScalar, 2)

            // tracking points
            points.AsParallel().ForAll(fun p ->
                CvInvoke.Circle(img, p, 2, Bgr(Color.Red).MCvScalar, -1)

            // age
            let age = string face.Appearance.Age
            CvInvoke.PutText(img, age, Point(rect.Right + 5, rect.Top), FontFace.HersheyComplex, 0.5, Bgr(Color.BlueViolet).MCvScalar, 1)

            // gender & appearance
            let gender = int face.Appearance.Gender

            // glasses
            let glasses = int face.Appearance.Glasses

            let appearanceFile = makeFileName gender glasses
            loadIntoImage img appearanceFile (Point(rect.Right + 5, rect.Top + 15)) iconSize

            // emoji
            if face.Emojis.dominantEmoji <> Affdex.Emoji.Unknown then
                let emofile = Path.ChangeExtension(Path.Combine(resources, (int >> string) face.Emojis.dominantEmoji), ".png")
                loadIntoImage img emofile (Point(rect.Left, rect.Top - 50)) iconSize

        viewer.Image <- img.Mat

let processStatusListener = {
new ProcessStatusListener with
    member this.onProcessingException ex = ()
    member this.onProcessingFinished () = finished <- true

Nothing all that tricky about this code. F# object expression comes in handy for quickly creating an object that implements an interface. onImageResults is the key function here. It processes everything and sends it to the EmguCV handy viewer, which is launched at the start of script execution and runs asynchronously (I like how it doesn’t force me to modify its UI elements on the same thread that created it. This is totally cheating and smells buggy, but it’s so convenient for scripting!)

// Create our simplistic UI
let viewer = new ImageViewer()
let sd =
    async {
        return (viewer.ShowDialog()) |> ignore

In the first couple of lines we transform the captured frame to EmguCV-understandable format. I am using Image rather than the recommended Mat class, because I want to splat emojis over the existing frames and as amazing as it is, the only way to do it that I know of in EmguCV is this counter-intuitive use of ROI. If anyone knows a better way of copying one image on top of another (should be easy, right?) please let me know.

The next few lines draw the statistics on the image: tracking points, emojis, and demographic data. Emojis are stored in files located in the resources path (see above, in my case I just copied them locally) with file names matching the SDK emoji codes. A simple function transforms these codes into file names. Finally, the modified frame is sent to the EmguCV viewer. That’s it!

let featurePointToPoint (fp : FeaturePoint) = Point(int fp.X, int fp.Y)
let mutable finished = false
let makeFileName i j = Path.ChangeExtension(Path.Combine(resources, String.Format("{0}{1}", i, j)), ".png")

Image Copy

The following two functions do the magic of copying emojis on top of the image:

let copyImage (src : Image<Bgr, byte>) (dest : Image<Rgb, byte>) (topLeft : Point) =
    let prevRoi = dest.ROI
    dest.ROI <- Rectangle(topLeft, src.Size)
    CvInvoke.cvCopy(src.Ptr, dest.Ptr, IntPtr.Zero)
    dest.ROI <- prevRoi

let loadIntoImage (img : Image<Rgb, byte>) (file : string) (topLeft : Point) (size : Size) =
        let src = new Image<Bgr, byte>(size)
        CvInvoke.Resize(new Image<Bgr, byte>(file), src, size)
        copyImage src img topLeft

copyImage first sets the ROI of the destination, then issues a legacy cvCopy call. It operates on pointer structures which is so ugly! There really should be a better way.

Zooming Through Euler Path: Supercharging with GPU

December 26, 2016 Leave a comment

So, continuing where we left off:

  1. Walking the Euler Path: Intro
  2. Visualizing Graphs
  3. Walking the Euler Path: GPU for the Road
  4. Walking the Euler Path: PIN Cracking and DNA Sequencing

For the Win

And finally I ran the GPU-enabled algorithm for finding the Euler path.

let sw = Stopwatch()
let N = 1024 * 1024
let k = 7
let avgedges k = [1..k] |> float |> List.average
let gr = StrGraph.GenerateEulerGraph(N * 10, k)
printfn "Generated euler graph in %A, edges: %s" sw.Elapsed (String.Format("{0:N0}", gr.NumEdges))

let eulerCycle = findEulerTimed gr // GPU-based algorithm
let eulerVert = gr.FindEulerCycle() // Hierholzer algorithm
let cpu = float sw.ElapsedMilliseconds
printfn "CPU: Euler cycle generated in %A" sw.Elapsed

And the results:

Generating euler graph: vertices = 10,485,760; avg out/vertex: 4
Generated euler graph in 00:00:19.7520656, edges: 41,944,529
Euler graph: vertices - 10,485,760.00, edges - 41,944,529.00
1. Predecessors computed in 00:00:03.2146705
2. Partitioned linear graph in 00:00:06.4475982
Partitions of LG: 6
3. Circuit graph generated in 00:00:31.4655218
4. Swips implemented in 00:00:00.2189634

GPU: Euler cycle generated in 00:00:41.3474044
CPU: Euler cycle generated in 00:01:02.9022833

And I was like: WOW! Finally! Victory is mine! This is awesome! I’m awesome, etc. Victory dance, expensive cognac.
Then, after the euphoria subsided a little, I decided to make the mandatory chart:

Well, this was sobering!
While the CPU series line displays expected behavior, something is definitely not right with the GPU series: there is obviously some variable at work that I am not taking into account. So, from the beginning.

The Algorithm

I owe the algorithm to this master thesis, which actually implements the algorithm proposed by B. Awerbuch, A. Israeli and Y. Shiloach, “Finding euler circuits in logarithmic parallel time,” in Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, 1984, pp. 249-257.

The algorithm as I see it may be split into 4 stages (even 3, but 4 is slightly more convenient implementation-wise). Let’s illustrate.
Start with an Euler graph like the one below. It has 15 vertices with an average of 3 edges/vertex in one direction (maxOutOrInEdges = k, we have 44 edges in this case):

let N = 15
let k = 5
let gr = StrGraph.GenerateEulerGraph(N, k)


1. We walk it as we like, computing edge predecessors. For two edges e_1 = (u_1, v_1), e_2 = (u_2, v_2),\ e_1 is a predecessor of e_2 iff v_1 \equiv u_2, i.e. One edge begins where its predecessor ends. In our representation it’s easy to construct the array of predecessors:

let predecessors (gr : DirectedGraph<'a>) =

    let rowIndex = arrayCopy gr.RowIndex
    let ends = gr.ColIndex

    let predecessors = Array.create gr.NumEdges -1

    [|0..ends.Length - 1|]
        |> Array.iter
        (fun i ->
            predecessors.[rowIndex.[ends.[i]]] <- i
            rowIndex.[ends.[i]] <- rowIndex.[ends.[i]] + 1


2. At this point, if we are lucky, we have the representation of an Euler cycle as edges of the graph. We just need to walk the array we have “backwards”, seeding the final list of edges with edge 0, constructing the list recursively like so: predecessors.[List.head result] :: result. Alternatively, we may generate a graph out of the result and reverse it. (directions of the arrows need to be reversed since this is a predecessor graph. Euler cycles of the graph, where all directions are reversed are the same as those of the original one, reversed.)

In case we aren’t lucky, we consider our predecessor array to be a graph, where each edge of the original graph becomes a vertex and identify partitions of the graph:


This is the weak point of the algorithm. Partitioning a graph is, in general, a hard problem (NP-complete, to be precise), however, in this case, due to a very simple structure of the predecessor graph, the complexity is linear in the number of edges of the original graph: O(|E|).

let partitionLinear (end' : int [])=
    let allVertices = HashSet<int>(end')
    let colors = Array.create end'.Length -1
    let mutable color = 0

    while allVertices.Count > 0 do
        let mutable v = allVertices.First()
        while colors.[v] < 0 do
            allVertices.Remove v |> ignore
            colors.[v] <- color
            v <- end'.[v]
        color <- color + 1
    colors, color

So, now the goal is to join all the circles above into one circle, this is done in the crucial step 3

3. We further collapse the graph based on partitioning. Now, each partition becomes a vertex of the new graph. Edges of this new “circuit graph” are vertices of the original graph, such that each edge represents a vertex two partitions have in common.

This is the only part of the algorithm where the GPU is used and is very effective. Incidentally, I took the code almost verbatim from the original thesis, however, the author for some reason preferred not to implement this step on the GPU.

The idea is simple: we loop over the original graph vertex-by-vertex and try to figure out whether edges entering this vertex belong to different partitions (have different colors in the terminology of the code above). Each vertex is processed in a CUDA kernel:

let gcGraph, links, validity = generateCircuitGraph gr.RowIndex partition maxPartition


4. This graph is greatly over-determined: we don’t need ALL vertices that partitions have in common (represented by edges here). Also, it’s important to note that this graph is not directed: if partition 0 has a vertex in common with partition 1, then this is the same vertex partition 1 has in common with partition 0. In our implementation this un-directionality is reflected by over-directionality: every edge (u_1, v_1) is repeated as (v_1, u_1) All we actually need is a spanning tree of this graph:



Alright, this is much better – ignore directions. The output of step 3 gives us vertices of the original graph where our partitions intersect. We now need to swap edges of our original predecessor array around these vertices, so that each partition is not closed off on itself, but merges with its neighbor (it’s but a small correction to our original predecessor walk). We do this one-by-one, so partition 0 merges first with 1, then with 2. And 2 – with 3. And 1 with 4.

let fixedPredecessors = fixPredecessors gcGraph links edgePredecessors validity
let finalGraph = StrGraph.FromVectorOfInts fixedPredecessors


And it’s a beautiful circle, we are done!

Why not Break out That Cognac?

let N = 1024 * 1024
let i = 1

let gr = StrGraph.GenerateEulerGraphAlt(N * i, 3 * i * N)
let eulerCycle = findEulerTimed gr
Euler graph: vertices - 1,048,575.00, edges - 3,145,727.00
1. Predecessors computed in 00:00:00.3479258
2. Partitioned linear graph in 00:02:48.3658898
Partitions of LG: 45514
# of partitions: 45514 (CPU generation of CG)
3. Circuit graph generated in 00:00:34.1632645
4. Swips implemented in 00:00:00.1707746
GPU: Euler cycle generated in 00:03:23.0505569

This is not very impressive. What’s happening? Unfortunately graph structure holds the key together with the HashSet implementation.

The deeper the graph the better it will fare in the new algorithm. The bottleneck is the partitioning stage. Even though its complexity is theoretically O(|E|), I am using a HashSet to restart partitioning when needed and that presents a problem, as accessing it is not always O(1)!

The methods for Euler graph generation are implemented as GenerateEulerGraph and GenerateEulerGraphAlt. The first one “pleases the code”, and generates graphs that are very deep even when the number of edges is large. Usually I get less than 10 partitions, which means that every time I generate predecessors, I’m pretty much guaranteed to be “almost there” as far as finding a cycle. The second method tends to generate very shallow graphs, as the example above shows: I got a fairly large number of partitions while the number of edges is only around 3 million. So while the rest of the algorithm performance is pretty descent, computing partitions just kills the whole thing.

Store the cognac for another time.

Categories: bioinformatics, CUDA, F# Tags: , , , ,