Decomposition Problem with F#, Dynamic Programming

As my former boss and mentor used to say, in order to understand recursion one must first understand recursion. This is funny, ha-ha, but if we tweak it slightly we get a really useful statement: in order to understand dynamic programming, one must first understand recursion. Here is an example.

Sources

Github: http://github.com/fierval/BioInfo

Decomposition Problem

Given a number, m, and an (ordered) set \mathcal{N} = \{n_i\}, \ n_i > 0, \ n_{i+1} \geq n_{i} output how many sets \mathcal{A}=\{a_i\}, \ a_i \geq 0 there are, such that m=\sum\limits_i a_i\dot n_i

Sequencing Antibiotics

I was solving a “particular” case (the solution is nonetheless general, but just to explain function names appearing in the code), in Chapter 2 on Antibiotics sequencing in my bioinformatics book.

Antibiotics sequencing is done through mass spectrum analysis, so if we have a spectrum of weight m, we may need to build the list of linear peptides of the same mass. In a toy version of this problem, we have an integer value of m and a list of amino acid weights (\mathcal{N}) out of which antibiotics can be built:

let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]

To Understand Recursion

Given a number m, we assume that the problem is solved for m-n_0, m-n_1, etc until m-n_i, and we have values k_0, k_1, ..., k_i for these solutions Our solution, then is simply \sum\limits_i k_i. We are done.

If we look a bit closer, however, we’ll see that our recursion creates a tree with one root node (a solution for the case of m), branching exponentially: at the first level we get l = |\mathcal{N}| new nodes, each of which also produces l nodes at the next level. So this will pretty quickly become unfeasible.

Even if we did have unlimited computing power, this is still not the most efficient use of it, since lots of quantities will be recursively computed multiple times as we descend down the recursion tree.

Dynamic Programming

We therefore turn to dynamic programming, which is sort of recursion backwards. We still use the idea above, but instead of going back we go forward. We pre-compute all the necessary values by starting from n_0, the minimal member of \mathcal(N), marching until m. At each step we will perform exactly the recursive step described above, so once we reach higher values of m, all previous values of it will be ready!

The Solution

We allocate an array of 0 elements and length m (this can actually be optimized to be of some constant length of O(l), perhaps just of length l, I didn’t check, since as we move along old values become useless). We set all entries corresponding to n_i to 1.

The rest is pretty straight-forward.

open System

// array of amino acid weights
let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]

/// Count the number of peptides of mass m
let countNumPeptides m =
    let allCounts : int64 [] = Array.zeroCreate (max m weights.[weights.Length - 1] + 1)
        
    // initialize dynamic programming store
    Array.ForEach(weights, fun w -> allCounts.[w] <- 1L)

    // equivalent to a simple loop, but we are using tail recursion
    let rec fillDynArray n =
        if n > m then allCounts.[m]
        else
            // dynamic_count(N) = Sum(dynamic_count((N - weights[0]), dynamic_count(N - weights[1]),.., dynamic_count(N - weights.Last())
            let step = 
                weights 
                |> Array.map (fun w -> if n - w < 0 then 0L else allCounts.[n - w])
                |> Array.sum

            allCounts.[n] <- allCounts.[n] + step
            fillDynArray (n + 1)

    fillDynArray weights.[0], allCounts

…and a mandatory chart

I really enjoy working with FSharp.Charting (available through NuGet), and it would be cool to visualize how fast the number of solutions grows with m increasing linearly.

Notice above I am returning the entire array of counts we have generated, so since we are saving all of the solutions from n_0 to m, let’s see what we get:

#load @"..\packages\FSharp.Charting.0.90.12\FSharp.Charting.fsx"
open FSharp.Charting

let chartNumPeptides m =
    let _, counts = countNumPeptides m
    let series = 
        Array.init m (fun i -> i, if counts.[i] > 0L then counts.[i] else 1L)
    
    Chart.Line series.[weights.[0]..]
    |> Chart.WithXAxis(Min=100., Max=1000.)
    |> Chart.WithYAxis(Log=true)

Cheating a little here: if values are 0 I’m setting them to 1 since this is a semi-log chart (no log of 0). I think it’s easier than matplotlib!
peptides

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.