This blog chronicles my experiences learning F# via implementing Push: a programming language designed for use in evolving populations of programs geared for solving symbolic regression problems (genetic programming). It has since evolved to contain bits and pieces reflecting my thorny twisted software development path.

## Decomposition Problem with F#, Dynamic Programming

As my former boss and mentor used to say, in order to understand recursion one must first understand recursion. This is funny, ha-ha, but if we tweak it slightly we get a really useful statement: in order to understand dynamic programming, one must first understand recursion. Here is an example.

### Decomposition Problem

Given a number, $m$, and an (ordered) set $\mathcal{N} = \{n_i\}, \ n_i > 0, \ n_{i+1} \geq n_{i}$ output how many sets $\mathcal{A}=\{a_i\}, \ a_i \geq 0$ there are, such that $m=\sum\limits_i a_i\dot n_i$

### Sequencing Antibiotics

I was solving a “particular” case (the solution is nonetheless general, but just to explain function names appearing in the code), in Chapter 2 on Antibiotics sequencing in my bioinformatics book.

Antibiotics sequencing is done through mass spectrum analysis, so if we have a spectrum of weight $m$, we may need to build the list of linear peptides of the same mass. In a toy version of this problem, we have an integer value of $m$ and a list of amino acid weights ($\mathcal{N}$) out of which antibiotics can be built:

let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]


### To Understand Recursion

Given a number $m$, we assume that the problem is solved for $m-n_0, m-n_1$, etc until $m-n_i$, and we have values $k_0, k_1, ..., k_i$ for these solutions Our solution, then is simply $\sum\limits_i k_i$. We are done.

If we look a bit closer, however, we’ll see that our recursion creates a tree with one root node (a solution for the case of $m$), branching exponentially: at the first level we get $l = |\mathcal{N}|$ new nodes, each of which also produces $l$ nodes at the next level. So this will pretty quickly become unfeasible.

Even if we did have unlimited computing power, this is still not the most efficient use of it, since lots of quantities will be recursively computed multiple times as we descend down the recursion tree.

### Dynamic Programming

We therefore turn to dynamic programming, which is sort of recursion backwards. We still use the idea above, but instead of going back we go forward. We pre-compute all the necessary values by starting from $n_0$, the minimal member of $\mathcal(N)$, marching until $m$. At each step we will perform exactly the recursive step described above, so once we reach higher values of $m$, all previous values of it will be ready!

### The Solution

We allocate an array of 0 elements and length $m$ (this can actually be optimized to be of some constant length of $O(l)$, perhaps just of length $l$, I didn’t check, since as we move along old values become useless). We set all entries corresponding to $n_i$ to 1.

The rest is pretty straight-forward.

open System

// array of amino acid weights
let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]

/// Count the number of peptides of mass m
let countNumPeptides m =
let allCounts : int64 [] = Array.zeroCreate (max m weights.[weights.Length - 1] + 1)

// initialize dynamic programming store
Array.ForEach(weights, fun w -> allCounts.[w] <- 1L)

// equivalent to a simple loop, but we are using tail recursion
let rec fillDynArray n =
if n > m then allCounts.[m]
else
// dynamic_count(N) = Sum(dynamic_count((N - weights[0]), dynamic_count(N - weights[1]),.., dynamic_count(N - weights.Last())
let step =
weights
|> Array.map (fun w -> if n - w < 0 then 0L else allCounts.[n - w])
|> Array.sum

allCounts.[n] <- allCounts.[n] + step
fillDynArray (n + 1)

fillDynArray weights.[0], allCounts


### …and a mandatory chart

I really enjoy working with FSharp.Charting (available through NuGet), and it would be cool to visualize how fast the number of solutions grows with $m$ increasing linearly.

Notice above I am returning the entire array of counts we have generated, so since we are saving all of the solutions from $n_0$ to $m$, let’s see what we get:

#load @"..\packages\FSharp.Charting.0.90.12\FSharp.Charting.fsx"
open FSharp.Charting

let chartNumPeptides m =
let _, counts = countNumPeptides m
let series =
Array.init m (fun i -> i, if counts.[i] > 0L then counts.[i] else 1L)

Chart.Line series.[weights.[0]..]
|> Chart.WithXAxis(Min=100., Max=1000.)
|> Chart.WithYAxis(Log=true)


Cheating a little here: if values are 0 I’m setting them to 1 since this is a semi-log chart (no log of 0). I think it’s easier than matplotlib!

Categories: bioinformatics, F#

## Fun with Alea.CUDA, F# Interactive, Charts

July 19, 2015 1 comment

Source code for this post can be found on my GitHub.

It’s great to see technologies evolving over the years. Alea.CUDA has done so in leaps and bounds since the first time I laid eyes on it a couple of years ago. At the time the name seemed unfortunate and hinted at the “aleatic” programming paradigm, meaning that you could not reliably predict the results your code would produce in consequent runs (alea is a Latin dice). It has all changed since.

I listened to an NVIDA Webinar on the subject and immediately wanted to take Alea.CUDA development for a test drive. I wanted to do it the same way I would in Python: just like we use Numba and IPython console, use F# scripting with F# Interactive.

For a test subject, I picked the same 1D LBP described in my post here. The actual algorithm for extracting the local binary pattern and its use in face recognition is described in this paper.

The algorithm consists of two parts:

1. Sliding a window of size 2*n across the array and computing the pattern at each location
2. Computing the histogram of these values.

Here is my F# way of doing it:

let lbp (arr : int[]) n =
if arr.Length < n * 2 + 1
then failwith ("Length should be at least " + (n * 2 + 1).ToString())

// computes a pattern around an array element i
let calcOnePattern i (around : int [])=
[0..around.Length - 1]
|> Seq.fold (fun st j -> if around.[j] > i then st ||| (1 <<< j) else st) 0

// moves sliding windoe of size '2*n' through the array and computes the pattern for each
// element at the center of a window
let prep =
[|n .. arr.Length - n - 1|]
|> Array.map (fun i -> calcOnePattern arr.[i] (Array.concat [arr.[i - n..i - 1]; arr.[i + 1..i + n]]))

// return histogram
prep
|> Array.fold (fun (st : int[]) v -> st.[v] <- st.[v] + 1; st)
(Array.zeroCreate (1 <<< n * 2))


For comparison, here is the Python code (here p is 2 ** [0..(2n-1)]), in F# code I’m simply shifting left, so no such array is necessary.

def extract_1dlbp_cpu(input, neighborhood, p):
hist = np.zeros(1 << (2 * neighborhood))
for i in range(neighborhood, len(input) - neighborhood):
left = input[i - neighborhood : i]
right = input[i + 1 : i + neighborhood + 1]
both = np.r_[left, right]
hist[np.sum(p [both >= input[i]])] += 1
return hist


## Putting it on GPU

I took one of Alea.CUDA provided sample scripts and modified it slightly. This is why my code is wrapped in a class. Here it is:

type LBP(arr : int [], ?n : int) =
inherit GPUModule(GPUModuleTarget.DefaultWorker)

let mutable arr = arr
let n = defaultArg n 4
do
if arr.Length < n * 2 + 1 then failwith ("Length should be at least " + (n * 2 + 1).ToString())

member this.Arr
with get() = arr
and set (value) = arr <- value

[<Kernel;ReflectedDefinition>]
member this.Kernel (arr:deviceptr<int>) (n:int)  (hist:deviceptr<int>) (len : int)=
let mutable i = blockIdx.x * blockDim.x + threadIdx.x
let mutable res = 0
if i < len - 2 * n then
i <- i + n
for j = i - n to i - 1 do
if arr.[j] >= arr.[i] then res <- res ||| (1 <<< (j - (i - n)))

for j = i + 1 to i + n do
if arr.[j] >= arr.[i] then res <- res ||| (1 <<< (j - (i - n + 1)))

__atomic_add (hist + res) 1 |> ignore

member this.Compute () =
let blockSize = 512
let gridSize = divup (arr.Length - 2 * n) blockSize
let lp = LaunchParam(gridSize, blockSize)

let hist = Array.zeroCreate (1 <<< n * 2)
use d_arr = this.GPUWorker.Malloc(arr)
use d_hist = this.GPUWorker.Malloc(hist)
this.GPULaunch <@ this.Kernel @> lp d_arr.Ptr n d_hist.Ptr (arr.Length)
d_hist.Gather()


There is not much here, at least in spirit, that any CUDA developer is not used to. We define a kernel and then launch it in a very familiar way, albeit wrapped in a slightly unfamiliar form. Nothing that is not easy to understand or adopt, though. F# quotations are used to interface with nvcc, so makes sense. Pointer arithmetic is wrapped nicely, so code like the one on line 25 above (__atomic_add) is possible. Works great, easy to use.

Here the kernel, as one would expect, computes a pattern around a single array element, at the same time contributing its result to the final histogram. Definitely room for optimization as this atomic_add is an obvious bottleneck, but I didn’t go any further since the results are quite spectacular already.

Also, I really like the “divup” function above, that figures out the number of blocks. Normally it is done as:
 gridSize = (total + blockSize) / blockSize 

… a small thing, but a nice convenience.

### Boilerplate

Since I was doing it in F# Interactive, some boilerplate was necessary. I installed my binaries through NuGet and here are the references. The “send to F# interactive” functionality that has been around for a while now is a great help to figure things out.

#r "System.Configuration.dll"
#I @"packages\Alea.Cuda.2.1.2.3274\lib\net40"
#I @"packages\NUnit.2.6.3\lib\"
#I @"packages\FsUnit.1.3.0.1\Lib\Net40"
#r "Alea.CUDA.dll"
#r "nunit.framework.dll"
#r "FsUnit.NUnit.dll"

open System
open System.IO
open Alea.CUDA
open Alea.CUDA.Utilities
open FsUnit
open FSharp.Charting
open System.Diagnostics

Alea.CUDA.Settings.Instance.Resource.AssemblyPath <- Path.Combine(__SOURCE_DIRECTORY__, @"packages\Alea.Cuda.2.1.2.3274\private")
Alea.CUDA.Settings.Instance.Resource.Path <- Path.Combine(__SOURCE_DIRECTORY__, @"release")



The last two lines come straight from Alea tutorial sample and are necessary to hook up the jit compiler. That’s it!

### Experiments & results

I used the following functions to drive the experiments:

// random array of length n
let generate n =
let rng = Random(int DateTime.Now.Ticks)
Array.init n (fun _ -> rng.Next())

// experiment: run from 10 ** low to 10 ** high array length
let experiment low high =
if low >= high || low < 0 then failwith "must be: low < high, both non-negative"

// salt it
let arr = [|0..1000|]
use lb = new LBP(arr)
lb.Compute() |> ignore

let sw = Stopwatch()
let cpuTimes = Array.zeroCreate (high - low + 1)
let gpuTimes = Array.zeroCreate (high - low + 1)

for i = low to high do
let range = int (10.0 ** float i)
let arr = generate range
lb.Arr <- arr

// Run on CPU
sw.Restart()
printfn "Legnth: %d" range
printfn "-----------"
let h1 = lbp arr 4
sw.Stop()

let idx = i - low

cpuTimes.[idx] <- range, sw.Elapsed.TotalSeconds
printfn "Computed on CPU: %0.5f sec" (snd cpuTimes.[idx])

//Run on GPU
sw.Restart()
let h2 = lb.Compute()
sw.Stop()
gpuTimes.[idx] <- range, float sw.Elapsed.TotalSeconds
printfn "Computed on GPU: %0.5f sec" (snd gpuTimes.[idx])
printfn ""

// make sure we are ok
should equal h1 h2

Chart.Combine(
[Chart.Line(cpuTimes, Name="CPU")
Chart.Line(gpuTimes, Name="GPU")
]
)
.WithYAxis(Log=true, Title = "sec")
.WithXAxis(Min=10.**float low, Log=true, Title = "length")
.WithLegend(InsideArea=true)



The helper “generate” generates a random integer array of arbitrary length, and the “experiment” function runs the experiments on arrays of sizes 10 ** [low..high]
with

experiment 3 7


I got:

 Legnth: 1000 ----------- Computed on CPU: 0.00937 sec Computed on GPU: 0.00285 sec

 Legnth: 10000 ----------- Computed on CPU: 0.02445 sec Computed on GPU: 0.00308 sec Legnth: 100000 ----------- Computed on CPU: 0.17697 sec Computed on GPU: 0.00388 sec Legnth: 1000000 ----------- Computed on CPU: 1.70085 sec Computed on GPU: 0.00412 sec 

Legnth: 10000000 ----------- Computed on CPU: 16.53772 sec Computed on GPU: 0.03045 sec 

Here the straight F# mode runs about twice as fast as pure Python:
 Length: 1000 -------------- Finished on CPU: time: 0.02263s

 Length: 10000 -------------- Finished on CPU: time: 0.28701s Length: 100000 -------------- Finished on CPU: time: 2.88549s 

Length: 1000000 -------------- Finished on CPU: time: 29.97346s 

The GPU performances are about comparable between Python and Numba and F# with Alea.CUDA. Same card: GTX Titan with 6Gb RAM, 14 SMPs

Of course a mandatory chart produced with FSharpChart and the following small scriptlet:

    Chart.Combine(
[Chart.Line(cpuTimes, Name="CPU")
Chart.Line(gpuTimes, Name="GPU")
]
)
.WithYAxis(Log=true, Title = "sec")
.WithXAxis(Min=10.**float low, Log=true, Title = "length")
.WithLegend(InsideArea=true)


(Still not sure, why my X axis starts from 100 and not from 1000, but I guess it’s minor :)).

A great experience doing all this overall: smooth and painless.

Categories: CUDA, F#, Parallel Tags: , , , , ,

## Motif Finding with Gibbs Sampling (F#)

May 17, 2015 1 comment

## The Problem

“Motif finding is a problem of finding common substrings of specified length in a set of strings. In bioinformatics, this is useful for finding transcription binding sites” (recap here). The problem is succinctly stated on Rosalind.

Given a set of strings DNA of size t, find “most common” substrings of length k. “Most common” means, that substrings should deviate from each other as little as possible, but they may not necessarily be the same. With this in mind, scoring is defined as follows:

let colScore (col : int []) =
col.Length - col.GroupBy(fun c -> c).Select(fun gr -> gr.Count()).Max()

let score (motifs : int [,]) =
[0..Array2D.length2 motifs - 1]
|> List.map (fun col -> motifs.[*,col] |> colScore)
|> List.sum


To illustrate with the DNA alphabet (we only have 4 letters in the alphabet = {A, C, G, T}), the following:

ACC
ACG
ACT
CCT

will be scored as: 1 + 0 + 2 = 3. We add up scores of each column, where the score of the column is defined as col_length – max_repeating_letters

So, column 1 score is 1 because A repeats 3 times – it is the maximum repeating letter, etc. The problem is then to find an arrangement of k-length strings (called k-mers) that minimize the score.

## Profile

Given a collection (matrix) of k-mers, a profile is built in order to determine what would be “the most probable” next k-mer if we were to augment our collection. A profile is a 4xk matrix, where P(i, j) is the probability of a given letter alphabet.[i] in the column (k-mer position) j. For the matrix above:

 3/4 0 0 1/4 4/4 1/4 0 0 1/4 0 0 2/4

(it’s the number of letters occurring in the column, divided by the total number of k-mers, t)

### Laplace Rule

We amend our simple profile calculation with the Laplace rule of succession. This is done to compensate for 0 and 1 probabilities: keeping those will have a disastrous effect on our algorithm, where we will never be able to pick a k-mer that, like in the example above, has anything other than “C” in the second position.

We apply the rule by always adding 4 strings to the collection, each consisting of one letter of the alphabet repeated k times. So the collection above becomes:

ACC
ACG
ACT
CCT
AAA
CCC
GGG
TTT

And the profile becomes:

 4/8 1/8 1/8 2/8 5/8 2/8 1/8 1/8 2/8 1/8 1/8 3/8
// borrowed from StackOverflow. Stack two 2D arrays vertically (I miss Python!)
let stackV (a1: 'a[,]) (a2: 'a[,]) =
let a1l1,a1l2,a2l1,a2l2 = (Array2D.length1 a1),(Array2D.length2 a1),(Array2D.length1 a2),(Array2D.length2 a2)
if a1l2 <> a2l2 then failwith "arrays have different column sizes"
let result = Array2D.zeroCreate (a1l1 + a2l1) a1l2
Array2D.blit a1 0 0 result 0 0 a1l1 a1l2
Array2D.blit a2 0 0 result a1l1 0 a2l1 a2l2
result

let createProfile (motifs : int[,]) =
let k = Array2D.length2 motifs
let t = Array2D.length1 motifs + 4

let appendLaplace () =
let As = Array.zeroCreate k
let rest =
[|yield As
for i = 1 to 3 do
yield As |> Array.map (fun a -> a + i) |]
|> array2D
stackV motifs rest

let m = appendLaplace()
let profile =
[|0..k - 1|]
|> Array.map
(fun col -> m.[0..,col]
.GroupBy(fun c -> c)
.OrderBy(fun gr -> gr.Key)
.Select(fun gr -> float(gr.Count()) / float t)
.ToArray())

Array2D.init 4 k (fun i j -> profile.[j].[i])



Here we append the motif matrix (represented by a 2D integer array in F#, where a row consists of integers in the range [0..3], corresponding to the letters of our DNA alphabet) with 4 additional entries as described above and retrieve the resulting profile matrix.

## Consensus k-mer

The profile matrix is used to pick a k-mer that fits the profile best. For instance, if we have built a profile based on motifs extracted from t – 1 DNA strings, and need to pick a k-mer from the t-th DNA string that fits the profile best, we first compute probabilities of all k-mers in string t based on the profile matrix:

Pr(k-mer|profile) = profile[k-mer[0], 0] * profile[k-mer[1], 1] *..* profile[k-mer[k-1], k-1]

(Here is where Laplace rule comes in handy: we would really be in trouble if there were 0 entries in the profilie matrix. Many k-mers would have Pr(k-mer|profile) = 0 and would short-circuit the algorithm).

and then pick the k-mer with the highest probability.

Code below computes probabilities of a set of k-mers given a profile.

let kmerProbs (kmers : string []) (profile : float [,]) =
let k = Array2D.length2 profile
let probs =
kmers
|> Array.map
(fun kmer ->
kmer, kmer.ToCharArray()
|> Array.map (fun c -> alphabet.[c])
|> Array.mapi (fun i ind -> profile.[ind, i])
|> Array.fold (fun state p -> state * p) 1.)
probs


## Gibbs Sampling

Given a discrete distribution, we want to sample from it:

1. Pick a sample s from the uniform distribution [0, n)
2. Lookup its probability, ps
3. Sample from a uniform [0, 1], pu
4. If pu <= ps – accept the sample and return it, otherwise repeat.

One thing to note here is that our probabilities do not necessarily sum up to 1 by design. So we first normalize them, by dividing each probability value by the sum of all probabilities.

The function below returns a motif sampled from the distribution we built from a profile:

let gibbsSample (rnd : Random) (probs : (string * float) []) =

let sumall = probs |> Array.sumBy (fun (s, p) -> p)
let prs = probs |> Array.map (fun (k, v) -> k, v / sumall)
let len = probs.Length

let rec rollUntilAccepted value =
let n = rnd.NextDouble()
if n <= snd prs.[value] then value
else
rollUntilAccepted (rnd.Next(0, len))

fst probs.[rollUntilAccepted (rnd.Next(0, len))]


## The Algorithm

Since solving the problem exactly by brute force is not feasible in practice with long strings and huge collections, approximations are used to get solutions that are “close enough”.

The Gibbs sampling based solution starts from a collection of random k-mers, each picked from one string of DNA and tries to improve on this original pick, performing a sequence of iterations, using Gibbs sampling.

Since the solution evidently depends on the initial collection picked, this is re-run many times to pick the collection with the minimal score.

Briefly the algorithm is:
Given a collection of strings (DNA) of size t, length k,

1. Start from a collection of motifs (k-mers) of size t randomly picked from DNA, one motif from one string of DNA
2. Store the score of this collection
3. Loop many times:
1. Select a random number j in [0, t)
2. Exclude j-th motif from the motif collection and build the profile of the remaining collection
3. Compute probability distribution of motifs in DNA[j] using the profile matrix from the previous step
4. Sample from the distribution and replace j-th motif with the obtained sample
5. Memorize best score, best collection of motifs

Here are the above steps translated to F#:

let gibbsSamplingMotifSearch (dna : string []) k iters rndStarts =
let t = dna.Length
//prepopulate the list of k-mers for all DNA strings
let kmers =
dna
|> Array.map(fun s ->
[0..s.Length - k]
.Select(fun i -> s.Substring(i, k)).Distinct().ToArray())

let len = dna.[0].Length - k

// single pass of the algorithm (single "random start")
let runSingle () =
let firstMotifs =
[|1..t|]
|> Array.map (fun i -> rnd.Next(0, len))
|> Array.mapi (fun i p -> dna.[i].Substring(p, k) |> toInts)

let bestMotifs = array2D firstMotifs
let bestScore = score bestMotifs

// repeat the sampling step n times
let rec runSampler (motifs : int [,]) (bestMotifs : int [,]) bestScore n =
if n = 0 then bestScore, bestMotifs
else
let except = rnd.Next(0, t)  // excluded motif
// exclude from the collection
let profileMotifs =
stackV motifs.[0..except-1, *] motifs.[except+1.., *]
let profile = createProfile profileMotifs // create profile from the rest
let probs = kmerProbs kmers.[except] profile // build probability distribution
let motif = array2D [|gibbsSample rnd probs |> toInts|] // sample to obtain a replacement
let curMotifs =
motifs.[except+1.., *]
|> stackV motif
|> stackV motifs.[0..except-1, *] // replace
let curScore = score motifs // obtain the current score

//recursive step, memorizing best results
runSampler curMotifs
(if curScore < bestScore then curMotifs else bestMotifs)
(if curScore < bestScore then curScore else bestScore)
(n - 1)
runSampler bestMotifs bestMotifs bestScore iters

let scoresMotifs = [1..rndStarts].AsParallel().Select(fun _ -> runSingle()).ToArray()



## Notes

Here is the header for this script:


open System
open System.Linq
open System.IO

open MathNet.Numerics.Random

let rndSeed = RandomSeed.Robust()
let rnd = Random.mersenneTwisterSeed rndSeed


I am using MathNet.Numerics random generator because it happens to be thread safe (which the .NET one is not). Thread safety is crucial because:

Line 46 above:

    let scoresMotifs = [1..rndStarts].AsParallel().Select(fun _ -> runSingle()).ToArray()


Since all single runs of the algorithm are created equal, it is easy to parallelize and improve your running time up to as many times as you have processor cores. This comes almost for free from .NET if you care enough to add AsParallel().

I do freely mix LINQ and native F# functionality. Perhaps many a purist would scorn me, but I find it works great in practice. Sometimes generic .NET structures which LINQ likes so much (and which I am not using here, not explicitly anyway) are simply faster because they are mutable (another heresy in the functional world, I realize, however it is faster to add an element to a collection than to rebuild the entire collection in order to add one element. For long collections it adds up).

I don’t use the query{} syntax in F# 3.1. I do like it, it is just that I’m used to lambdas and chaining. A matter of taste, I suppose.

The code above only has one explicit “for” loop (Line 17 in Laplace Rule code), tail recursion rocks!

And the final observation: I have seen lots of folks note how F# lets you do things succinctly. I would like to add, that it also enables you to be “right” the first time much more often than many other languages. (Meaning, that the functions you write have a much better chance to have fewer bugs the very first time you launch them than if you were to write them in a non-functional language).

Categories: bioinformatics, F#

## Modelling Stochastically Independent Processes with F# Computation Expressions: Part 2

December 11, 2014 1 comment

Code for this entry: here.

In this first part, we started shamelessly plagiarizing creatively reproducing a Clojure monad that does a great job describing stochastically independent processes.

After a few trials I became convinced, that the approach in the blog post mentioned above was indeed optimal: a “natural” approach of modelling an event as a (value, probability) tuple is flawed in a sense that the probability needs to always be carried around and that gets in the way of describing the process in a declarative way.

### Auxiliaries

I have defined a couple of auxiliary functions, just like in the Clojure case:

type Microsoft.FSharp.Collections.Map<'Key, 'Value when 'Key : comparison> with
static member mapValues (f : 'Value -> 'U) x =
x |> Map.toSeq |> Seq.map (fun (k, v) -> (k, f v))

let uniform (v : int seq) =
let v = v |> Seq.toList
v |> Seq.map (fun i -> i, 1N / BigRational.FromInt v.Length) |> Map.ofSeq


I have extended the Map type (because I can), to make the code below even more readable.

### Computation expression

type IndEventBuilder () =
member this.Zero () = Map.empty.Add(0, 0.)
member this.Return (e : 'a) : Map<'a, BigRational> = Map.empty.Add(e, 1N)
member this.Bind (m : Map<'a, BigRational>, fn : 'a -> Map<'b, BigRational>) : Map<'b, BigRational> =
m
|> Map.toSeq |> Seq.map (fun (key, value) -> (fn key) |> Map.mapValues (fun x -> x * value))
|> Seq.concat
|> Seq.groupBy(fun (key, value) -> key)
|> Seq.map(fun (key, sq) ->
(key, sq |>
Seq.map (fun (key, value) -> value) |> Seq.sum))
|> Map.ofSeq

let independent = IndEventBuilder()


Zero and Return functions are necessary: Zero because I want to use “if” statements within the computation expression, and Return is always necessary as a terminator of binding chains: Bind(.., Bind(.., (.., Return))). In that precise sense, Return makes the event marked by its argument occur with the probability 1.

Bind takes a distribution and unwraps it by making each event in the distribution occur together with what comes next:

1. For each event e in the distribution
2. Take the event and map it to the distribution that comes afterwards by multiplying the probability of this event by the probability of every event in the distribution that follows
3. Take the sequence of tuple sequences produced in 2 and flatten them
4. Isolate actual events and add up their individual probabilities, to compute the actual probability of each distinct event
5. Turn the resulting sequence into a map.

### Red Dog

Here is a slightly more interesting application: modelling the Red Dog game. Here is the “naive” version:

let simpleRedDog =
independent {
let! card1 = cardProb
let! card2 = cardProb
let! card3 = cardProb

let minCard = min card1 card2
let maxCard = max card1 card2

if card1 = card2 then
if card2 = card3 then return 10 else return 0
elif minCard + 1 = maxCard then return 0
elif minCard < card3 && card3 < maxCard then return 1 else return -1
}


This is the version from the Clojure blog (the rules are explained there as well). All we need to do is understand the rules and write them out. The only problem is that this implementation assumes cards are being replaced in the deck while the hand is played. Easy to fix, however, it actually gives close to correct results (with probabilities being small enough, rigorously accounting for replacements does not matter much):

val simpleRedDog : Map<int,BigRational> =
map [(-1, 88/169N); (0, 36/169N); (1, 44/169N); (10, 1/169N)]


Here is the fixed version, modelling the process without replacements:

let redDog =
let removeCard (v : BigRational) (i : BigInteger) =
if v = 0N then v
else
let mult = BigInteger range / v.Denominator
BigRational.FromBigInt (v.Numerator * mult - i) / BigRational.FromBigInt  (v.Denominator * mult - BigInteger.One)

let distWithRemoved card dist = dist |> Map.map (fun key v -> if key <> card then removeCard v BigInteger.Zero else removeCard v BigInteger.One)

independent {
let! card1 = cardProb
let removedCard = distWithRemoved card1 cardProb
let! card2 = removedCard
let! card3 = distWithRemoved card2 removedCard

let minCard = min card1 card2
let maxCard = max card1 card2

if card1 = card2 then
if card2 = card3 then return 10 else return 0
elif minCard + 1 = maxCard then return 0
elif minCard < card3 && card3 < maxCard then return 1 else return -1
}

let expect dist = dist |> Map.toSeq |> Seq.sumBy (fun (k, v) -> float k * BigRational.ToDouble v)



We simply adjust for the removed card in the distWithRemoved function. (Expressing this in Clojure is harder because monads are not part of the language, and so succinctness characteristic of Clojure will be affected).

The results:

val redDog : Map<int,BigRational> =
map [(-1, 8624/16575N); (0, 1112/5525N); (1, 352/1275N); (10, 1/425N)]


These are not all that different from the approximation. Even though the chance of an 11-fold gain appears to be about 3 times higher in the simplified version, the real difference between 0.59% and 0.23% (actual) is probably not all that great. Expected gains are also pretty close:

> expect redDog;;
val it : float = -0.220693816
> expect simpleRedDog;;
val it : float = -0.201183432


### BUGS/JAGS

BUGS & JAGS are famous enough with Bayesian statisticians to not require detailed description. Usage is based on a very cool principle of declarative programming: simply describe the distributions and let the system do the sampling behind the scenes. While the creators of these packages probably were not thinking in terms of functional programming, the results, in view of everything described above, are if not close, at least strikingly similar: we extend the language to model the process chain in a declarative way.

## Modelling Stochastically Independent Processes with F# Computation Expressions: Part 1

The idea for doing this is not new. There is an excellent series of posts closely tracing an article on applications of functional programming to probability.

A colleague of mine has recently called my attention to his own post of two years ago, where he describes a monad that models stochastically independent events in Clojure. I thought the latter implementation actually went to the heart of the whole idea of monads and that is why, once I started writing my own in F# from scratch, event though I naturally/brute-force-like, chose something described below and which corresponds exactly to the series of posts above, I eventually was compelled to do it my colleague’s way.

Here is the natural choice: a distribution (p.m.f) is just a sequence of events. Each event is simply a record/tuple/map of two values: the signifier of the event and its probability.

//underlying type
type 'a Outcome = {
Value: 'a
Probability : BigRational  }

type 'a Distribution = 'a Outcome seq


This is great, and it works very well, as the posts show. My slight dissatisfaction with this approach is due to the fact that the competing one appears much more transparent, extremely easy to read, easy to use. In fact, anyone who knows Clojure syntax can use it right away to model independent processes simply by writing them out in Clojure, AND there is no need for any helper functions to extract the results. It just works (from the post, with some minor corrections):

(domonad probability-m
[die1 (uniform [1 2 3 4 5 6])
die2 (uniform [1 2 3 4 5 6])]
(+ die1 die2))


This code models an experiment of rolling two dice and returns a probability mass function that describes the experiment. All we had to do as users was describe the experiment in the most natural way possible: what is the probability of getting an exact value as the sum of both values of 2 rolled dice. Hence the expression: extract the value from the first pmf, add it to the value extracted from the second. Don’t even need to think about probabilities, as the magic happens behind the monadic scene.

The code above gives us the result: {2 1/36, 3 1/18, 4 1/12, 5 1/9, 6 5/36, 7 1/6, 8 5/36, 9 1/9, 10 1/12, 11 1/18, 12 1/36} – which is a map of the sum of dice values to the probability of getting them. In F#, I believe, the Algol-like syntax actually works to our advantage (I call the computation expression “independent” for obvious reasons):

independent {
let! x = uniform [1..6]
let! y = uniform [1..6]
return x + y
}


When executed in F# interactive, and using the FSharp.PowerPack:

val mp : Map<int,BigRational> =
map
[(2, 1/36N); (3, 1/18N); (4, 1/12N); (5, 1/9N); (6, 5/36N); (7, 1/6N);
(8, 5/36N); (9, 1/9N); (10, 1/12N); ...]


We, amazingly enough, get the same answers. Next time: better examples and nuts & bolts of the code.

## Generating Permutations: Clojure or F#: Part 2

Marching on from the last post.

### Lazy Sequences

This is my favorite feature ever. If I want to generate just a few of 10! (nobody even knows how much that is) permutations, I could:

(take 10 (permute [1 2 3 4 5 6 7 8 9 10]))


provided, the function is defined (as described in the first post):

(defn permute [v]
(when-let [[pos2 pos1] (findStartingPos v)]
(let [nxt (sort-remainder (swapDigits v pos2 pos1) (inc pos1))]
(cons nxt (lazy-seq (permute nxt))))))


Here I am not sure which language I like more. Clojure has easier syntax: everything fits nicely within the recursive function call. Returning nil terminates the loop, while in F# you need to know to return an option type where None terminates iteration. On the other hand, I like the fact that everything is neatly wrapped in the “unfold” function: seems more “natural” to me: fold/unfold – there is a certain symmetry here. Also, everything exists nicely in this LINQ-like world…

let permute (v : 'a array when 'a: comparison) =
Seq.unfold
(fun prev ->
match findStartingPos prev with
| None -> None
| Some (cur, pos) ->
Some(prev, sortRemainder (swapPositions prev cur pos) (pos + 1))) v


### Weak Typing

I really like Clojure weak typing. And I like the F# strong type system:

let sortRemainder (v : 'a array) pos =
if v.Length - 1 = pos then v
else
[|
yield! v.[0..pos - 1]
yield! Array.sort v.[pos..v.Length - 1];
|]


F# type system requires that the first argument be qualified, but it is happy with this abbreviation, while the full qualification should be:

let sortRemainder (v : 'a array when 'a: comparison) pos =


Since we are sorting a subvector, the array has to be of a “comparable” type. Which is the condition of the applicability of the algorithm.

In Clojure it looks simpler, but it’s essentially the same:

(defn sort-remainder [v pos1]
(if (= (dec (count v)) pos1) v (into (subvec v 0 pos1) (sort (subvec v pos1)))))


### Tail Recursion

One more cool feature of functional languages. I think it’s another tie once you use it, although the “loop” construct that demands it is very nice.

The following function returns a tuple (current, found) of two positions within the array: one of the element that is being “promoted” up (current), and the other – of the smaller element being pushed back. (So, current > found && v[current] < v[found]). Or nil/None if no such pair can be found. This is the key function of the algorithm:

(defn findStartingPos [v]
(loop [cur (dec (count v))
acc [-1 -1]]
(let [maxPos (second acc)]
(if (or (< cur maxPos) (< cur 0))
(if (= maxPos -1) nil acc)
(if-let [pos (findFirstLessThan v cur)]
(recur (dec cur) (if (< maxPos pos) [cur pos] acc))
(recur (dec cur) acc))))))


And F#:

let findStartingPos v =
let rec findStartingPosRec cur acc =
let maxPos = snd acc
if cur < 0 || cur < maxPos then
if maxPos < 0 then None else Some acc
else
let pos = findFirstLessThan v cur
match pos with
| Some pos -> findStartingPosRec (cur - 1) (if maxPos < pos then (cur, pos) else acc)
| None -> findStartingPosRec (cur - 1) acc
findStartingPosRec (v.Length - 1)  (-1, -1)


It’s nice that we have a “loop” keyword in Clojure to provide cleaner syntax and more discipline for defining tail-recursive functions, but I am not appalled with the way we do it in F# either.

(The above functions contain obvious optimizations: we stop scanning once we have a pair of “swappable” elements and we have moved beyond the “found” position. Also, we discard a valid pair if we already have a pair where “found” position is larger than the “found” position of the current iteration).

## Doing it in a Massively Parallel Way

Of course, I like everything parallel… So what about doing it on a GPU, using, say CUDA? It is definitely possible, although probably not very practical. Even if we only have an array of 10 distinct elements, the number of permutations is already ridiculously large (although who knows what we are going to be using them for)… In any event, this is solvable if we can get “random access” to permutations. Instead of unfolding them as a lazy sequence, generate them all at once in a massively parallel fashion.

This is possible because permutations are neatly tied to factoradic numbers, as this Wikipedia article explains. So, it is always possible to generate “permutation #10” to be guaranteed different from “permutation #5” for distinct, fully ordered sets. (Any sets where ordering relationship is not defined can still be easily permuted as long as its elements are stored in indexed data structures, such as arrays, by simply generating permutations of indices). Thus, taking CUDA “single data multiple threads” computation model it is easy to generate all (or however many) permutations in parallel. Naturally, if we are not just outputting the results but need to store them, the exponential nature of the problem memory growth, as well as the number of threads required, and the limited amount of GPU memory (a single computer RAM for that matter) will quickly become a problem. I guess the CUDA C++ version of this will have to wait until the next job interview…

Categories: Clojure, CUDA, F# Tags: , , , ,

## The Alogirthm

Recently, I have entered a brave (new?) world of Clojure and was looking for a small project to take it for a ride. I stopped on a popular/interesting enough little problem, that subsumed a certain interview question which I was once unfortunate enough to stumble through with half my brain tied behind my back (long story).

The algorithm to generate permutations in sequence is actually quite simple and intuitive, I believe it originates somewhere in India thousands of years ago. At least that was the impression I got from a Wikipedia article. Anyway, given a fully ordered set S, generate all permutations of it. The algorithm works for sets with repetition, but we can assume no repetitions for simplicity:

1. (Since the set is fully ordered), sort it in ascending order, represent it as a sequence. Output it.
2. Find the next sequence consisting of the elements of S and following the current one immediately based on the ordering relationship. Output it.
3. Repeat 2, until no such sequence can be found.

The juice here is of course 2. To illustrate with an example. Suppose our sequence is just numbers “{}” here mean sequence, order matters: S ={1, 2, 3, 4}. The next one in lexicographical order is: {1, 2, 4, 3}. The one after it: {1, 3, 2, 4}. The one after: {1, 3, 4, 2}. Shall we do one more? {1, 4, 2, 3} Maybe just one more, so it becomes even clearer: {1, 4, 3, 2}.

So the key here, in case you have not caught on yet:

1. Starting from the end of the sequence, going backwards, find the first smallest position in the sequence, if such exists, where the element in current position can be swapped with the element in the found position to make the sequence “larger” than the previous one. This is accomplished if the element at the current position is less than the element at the found position. Swap elements in the current and the found positions.
2. For instance, looking at {1, 2, 4, 3} we find: (2 < 3), (2 < 4), (1 < 2). But 2 < 3 is the very first pair that fits the bill (we are starting from the back), so we don't go any further and swap them. We get {1, 3, 4, 2}

3. Sort the sub-sequence starting at the found + 1 position, ascending. So, in the above example we get: {1, 3, 2, 4}

Et c’est tout!

(How simple is that? I bet the brute force recursive “solution” feels very stupid right about now).

## Implementation

At this point it’s all but unnecessary to actually show the implementations. However, I just had fun with some of the Clojure concepts/structures, and wanted to compare them to what I am used to in F#. So this is the point. Meta-programming. The Clojure code is here and F# – here.

Observations:

### Immutability

The way the algorithm is structured, immutability is intuitive: we are producing a new sequence in each step, and old sequences should also be preserved and output. Should have been a no-brainer, since we are dealing with functional languages with immutable structures. And here was a slight jolt. Consider a function that swaps elements in the sequence (the algorithm calls for swapping, so…):

Clojure:

(defn swapDigits [v pos1 pos2] (assoc v pos2 (v pos1) pos1 (v pos2)))


(I was using strings of digits as input sets in my initial awkward steps through the language, hence the function name).

Now, I wanted something as short and as sweet as this for F#, so I cheated! .NET arrays are mutable and used I .NET arrays instead of F# lists (hope functional programming purists will commute a very harsh sentence I deserve for writing the following, and not condemn me to Java programming for life):

let swapPositions (v : 'a array) pos1 pos2 =
let res = Array.copy v
res.[pos1] <- v.[pos2]
res.[pos2] <- v.[pos1]
res


So yes. The code at line 2 is pure sin and I should have used F# lists which are immutable, and the code should have been:

let swapPositions (v : 'a list) pos1 pos2 =
v |> List.permute (fun i -> if i = pos1 then pos2 elif i = pos2 then pos1 else i)


but frankly, this is not the (very) first thing that comes to mind if you haven’t been using the language for a long time, and not as clear as the Clojure solution, which was close enough to the surface for a noob like myself to scoop it up.

Alright. This has gone on long enough, more soon…

Categories: Clojure, F#