Decomposition Problem with F#, Dynamic Programming

As my former boss and mentor used to say, in order to understand recursion one must first understand recursion. This is funny, ha-ha, but if we tweak it slightly we get a really useful statement: in order to understand dynamic programming, one must first understand recursion. Here is an example.

Sources

Github: http://github.com/fierval/BioInfo

Decomposition Problem

Given a number, $m$ , and an (ordered) set $\mathcal{N} = \{n_i\}, \ n_i > 0, \ n_{i+1} \geq n_{i}$ output how many sets $\mathcal{A}=\{a_i\}, \ a_i \geq 0$ there are, such that $m=\sum\limits_i a_i\dot n_i$

Sequencing Antibiotics

I was solving a “particular” case (the solution is nonetheless general, but just to explain function names appearing in the code), in Chapter 2 on Antibiotics sequencing in my bioinformatics book.

Antibiotics sequencing is done through mass spectrum analysis, so if we have a spectrum of weight $m$ , we may need to build the list of linear peptides of the same mass. In a toy version of this problem, we have an integer value of $m$ and a list of amino acid weights ( $\mathcal{N}$ ) out of which antibiotics can be built:

let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]

To Understand Recursion

Given a number $m$ , we assume that the problem is solved for $m-n_0, m-n_1$ , etc until $m-n_i$ , and we have values $k_0, k_1, ..., k_i$ for these solutions Our solution, then is simply $\sum\limits_i k_i$ . We are done.

If we look a bit closer, however, we’ll see that our recursion creates a tree with one root node (a solution for the case of $m$ ), branching exponentially: at the first level we get $l = |\mathcal{N}|$ new nodes, each of which also produces $l$ nodes at the next level. So this will pretty quickly become unfeasible.

Even if we did have unlimited computing power, this is still not the most efficient use of it, since lots of quantities will be recursively computed multiple times as we descend down the recursion tree.

Dynamic Programming

We therefore turn to dynamic programming, which is sort of recursion backwards. We still use the idea above, but instead of going back we go forward. We pre-compute all the necessary values by starting from $n_0$ , the minimal member of $\mathcal(N)$ , marching until $m$ . At each step we will perform exactly the recursive step described above, so once we reach higher values of $m$ , all previous values of it will be ready!

The Solution

We allocate an array of 0 elements and length $m$ (this can actually be optimized to be of some constant length of $O(l)$ , perhaps just of length $l$ , I didn’t check, since as we move along old values become useless). We set all entries corresponding to $n_i$ to 1.

The rest is pretty straight-forward.

open System

// array of amino acid weights
let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]

/// Count the number of peptides of mass m
let countNumPeptides m =
    let allCounts : int64 [] = Array.zeroCreate (max m weights.[weights.Length - 1] + 1)
        
    // initialize dynamic programming store
    Array.ForEach(weights, fun w -> allCounts.[w] <- 1L)

    // equivalent to a simple loop, but we are using tail recursion
    let rec fillDynArray n =
        if n > m then allCounts.[m]
        else
            // dynamic_count(N) = Sum(dynamic_count((N - weights[0]), dynamic_count(N - weights[1]),.., dynamic_count(N - weights.Last())
            let step = 
                weights 
                |> Array.map (fun w -> if n - w < 0 then 0L else allCounts.[n - w])
                |> Array.sum

            allCounts.[n] <- allCounts.[n] + step
            fillDynArray (n + 1)

    fillDynArray weights.[0], allCounts

…and a mandatory chart

I really enjoy working with FSharp.Charting (available through NuGet), and it would be cool to visualize how fast the number of solutions grows with $m$ increasing linearly.

Notice above I am returning the entire array of counts we have generated, so since we are saving all of the solutions from $n_0$ to $m$ , let’s see what we get:

#load @"..\packages\FSharp.Charting.0.90.12\FSharp.Charting.fsx"
open FSharp.Charting

let chartNumPeptides m =
    let _, counts = countNumPeptides m
    let series = 
        Array.init m (fun i -> i, if counts.[i] > 0L then counts.[i] else 1L)
    
    Chart.Line series.[weights.[0]..]
    |> Chart.WithXAxis(Min=100., Max=1000.)
    |> Chart.WithYAxis(Log=true)

Cheating a little here: if values are 0 I’m setting them to 1 since this is a semi-log chart (no log of 0). I think it’s easier than matplotlib!

Decomposition Problem with F#, Dynamic Programming

Sources

Decomposition Problem

Sequencing Antibiotics

To Understand Recursion

Dynamic Programming

The Solution

…and a mandatory chart

Published by fierval

Leave a comment Cancel reply

Sources

Decomposition Problem

Sequencing Antibiotics

To Understand Recursion

Dynamic Programming

The Solution

…and a mandatory chart

Share this:

Published by fierval

Leave a comment Cancel reply