As my former boss and mentor used to say, in order to understand recursion one must first understand recursion. This is funny, ha-ha, but if we tweak it slightly we get a really useful statement: in order to understand dynamic programming, one must first understand recursion. Here is an example.
Sources
Github: http://github.com/fierval/BioInfo
Decomposition Problem
Given a number, , and an (ordered) set
output how many sets
there are, such that
Sequencing Antibiotics
I was solving a “particular” case (the solution is nonetheless general, but just to explain function names appearing in the code), in Chapter 2 on Antibiotics sequencing in my bioinformatics book.
Antibiotics sequencing is done through mass spectrum analysis, so if we have a spectrum of weight , we may need to build the list of linear peptides of the same mass. In a toy version of this problem, we have an integer value of
and a list of amino acid weights (
) out of which antibiotics can be built:
let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|]
To Understand Recursion
Given a number , we assume that the problem is solved for
, etc until
, and we have values
for these solutions Our solution, then is simply
. We are done.
If we look a bit closer, however, we’ll see that our recursion creates a tree with one root node (a solution for the case of ), branching exponentially: at the first level we get
new nodes, each of which also produces
nodes at the next level. So this will pretty quickly become unfeasible.
Even if we did have unlimited computing power, this is still not the most efficient use of it, since lots of quantities will be recursively computed multiple times as we descend down the recursion tree.
Dynamic Programming
We therefore turn to dynamic programming, which is sort of recursion backwards. We still use the idea above, but instead of going back we go forward. We pre-compute all the necessary values by starting from , the minimal member of
, marching until
. At each step we will perform exactly the recursive step described above, so once we reach higher values of
, all previous values of it will be ready!
The Solution
We allocate an array of 0 elements and length (this can actually be optimized to be of some constant length of
, perhaps just of length
, I didn’t check, since as we move along old values become useless). We set all entries corresponding to
to 1.
The rest is pretty straight-forward.
open System // array of amino acid weights let weights = [|57; 71; 87; 97; 99; 101; 103; 113; 114; 115; 128; 129; 131; 137; 147; 156; 163; 186|] /// Count the number of peptides of mass m let countNumPeptides m = let allCounts : int64 [] = Array.zeroCreate (max m weights.[weights.Length - 1] + 1) // initialize dynamic programming store Array.ForEach(weights, fun w -> allCounts.[w] <- 1L) // equivalent to a simple loop, but we are using tail recursion let rec fillDynArray n = if n > m then allCounts.[m] else // dynamic_count(N) = Sum(dynamic_count((N - weights[0]), dynamic_count(N - weights[1]),.., dynamic_count(N - weights.Last()) let step = weights |> Array.map (fun w -> if n - w < 0 then 0L else allCounts.[n - w]) |> Array.sum allCounts.[n] <- allCounts.[n] + step fillDynArray (n + 1) fillDynArray weights.[0], allCounts
…and a mandatory chart
I really enjoy working with FSharp.Charting (available through NuGet), and it would be cool to visualize how fast the number of solutions grows with increasing linearly.
Notice above I am returning the entire array of counts we have generated, so since we are saving all of the solutions from to
, let’s see what we get:
#load @"..\packages\FSharp.Charting.0.90.12\FSharp.Charting.fsx" open FSharp.Charting let chartNumPeptides m = let _, counts = countNumPeptides m let series = Array.init m (fun i -> i, if counts.[i] > 0L then counts.[i] else 1L) Chart.Line series.[weights.[0]..] |> Chart.WithXAxis(Min=100., Max=1000.) |> Chart.WithYAxis(Log=true)
Cheating a little here: if values are 0 I’m setting them to 1 since this is a semi-log chart (no log of 0). I think it’s easier than matplotlib!