## Modelling Stochastically Independent Processes with F# Computation Expressions: Part 2

Code for this entry: here.

In this first part, we started ~~shamelessly plagiarizing~~ creatively reproducing a Clojure monad that does a great job describing stochastically independent processes.

After a few trials I became convinced, that the approach in the blog post mentioned above was indeed optimal: a “natural” approach of modelling an event as a (value, probability) tuple is flawed in a sense that the probability needs to always be carried around and that gets in the way of describing the process in a declarative way.

### Auxiliaries

I have defined a couple of auxiliary functions, just like in the Clojure case:

type Microsoft.FSharp.Collections.Map<'Key, 'Value when 'Key : comparison> with static member mapValues (f : 'Value -> 'U) x = x |> Map.toSeq |> Seq.map (fun (k, v) -> (k, f v)) let uniform (v : int seq) = let v = v |> Seq.toList v |> Seq.map (fun i -> i, 1N / BigRational.FromInt v.Length) |> Map.ofSeq

I have extended the Map type (because I can), to make the code below even more readable.

### Computation expression

type IndEventBuilder () = member this.Zero () = Map.empty.Add(0, 0.) member this.Return (e : 'a) : Map<'a, BigRational> = Map.empty.Add(e, 1N) member this.Bind (m : Map<'a, BigRational>, fn : 'a -> Map<'b, BigRational>) : Map<'b, BigRational> = m |> Map.toSeq |> Seq.map (fun (key, value) -> (fn key) |> Map.mapValues (fun x -> x * value)) |> Seq.concat |> Seq.groupBy(fun (key, value) -> key) |> Seq.map(fun (key, sq) -> (key, sq |> Seq.map (fun (key, value) -> value) |> Seq.sum)) |> Map.ofSeq let independent = IndEventBuilder()

Zero and Return functions are necessary: Zero because I want to use “if” statements within the computation expression, and Return is always necessary as a terminator of binding chains: Bind(.., Bind(.., (.., Return))). In that precise sense, Return makes the event marked by its argument occur with the probability 1.

Bind takes a distribution and unwraps it by making each event in the distribution occur together with what comes next:

- For each event e in the distribution
- Take the event and map it to the distribution that comes afterwards by multiplying the probability of this event by the probability of every event in the distribution that follows
- Take the sequence of tuple sequences produced in 2 and flatten them
- Isolate actual events and add up their individual probabilities, to compute the actual probability of each distinct event
- Turn the resulting sequence into a map.

### Red Dog

Here is a slightly more interesting application: modelling the Red Dog game. Here is the “naive” version:

let simpleRedDog = independent { let! card1 = cardProb let! card2 = cardProb let! card3 = cardProb let minCard = min card1 card2 let maxCard = max card1 card2 if card1 = card2 then if card2 = card3 then return 10 else return 0 elif minCard + 1 = maxCard then return 0 elif minCard < card3 && card3 < maxCard then return 1 else return -1 }

This is the version from the Clojure blog (the rules are explained there as well). All we need to do is understand the rules and write them out. The only problem is that this implementation assumes cards are being replaced in the deck while the hand is played. Easy to fix, however, it actually gives close to correct results (with probabilities being small enough, rigorously accounting for replacements does not matter much):

val simpleRedDog : Map<int,BigRational> = map [(-1, 88/169N); (0, 36/169N); (1, 44/169N); (10, 1/169N)]

Here is the fixed version, modelling the process without replacements:

let redDog = let removeCard (v : BigRational) (i : BigInteger) = if v = 0N then v else let mult = BigInteger range / v.Denominator BigRational.FromBigInt (v.Numerator * mult - i) / BigRational.FromBigInt (v.Denominator * mult - BigInteger.One) let distWithRemoved card dist = dist |> Map.map (fun key v -> if key <> card then removeCard v BigInteger.Zero else removeCard v BigInteger.One) independent { let! card1 = cardProb let removedCard = distWithRemoved card1 cardProb let! card2 = removedCard let! card3 = distWithRemoved card2 removedCard let minCard = min card1 card2 let maxCard = max card1 card2 if card1 = card2 then if card2 = card3 then return 10 else return 0 elif minCard + 1 = maxCard then return 0 elif minCard < card3 && card3 < maxCard then return 1 else return -1 } let expect dist = dist |> Map.toSeq |> Seq.sumBy (fun (k, v) -> float k * BigRational.ToDouble v)

We simply adjust for the removed card in the distWithRemoved function. (Expressing this in Clojure is harder because monads are not part of the language, and so succinctness characteristic of Clojure will be affected).

The results:

val redDog : Map<int,BigRational> = map [(-1, 8624/16575N); (0, 1112/5525N); (1, 352/1275N); (10, 1/425N)]

These are not all that different from the approximation. Even though the chance of an 11-fold gain appears to be about 3 times higher in the simplified version, the real difference between 0.59% and 0.23% (actual) is probably not all that great. Expected gains are also pretty close:

> expect redDog;; val it : float = -0.220693816 > expect simpleRedDog;; val it : float = -0.201183432

### BUGS/JAGS

BUGS & JAGS are famous enough with Bayesian statisticians to not require detailed description. Usage is based on a very cool principle of declarative programming: simply describe the distributions and let the system do the sampling behind the scenes. While the creators of these packages probably were not thinking in terms of functional programming, the results, in view of everything described above, are if not close, at least strikingly similar: we extend the language to model the process chain in a declarative way.

## Modelling Stochastically Independent Processes with F# Computation Expressions: Part 1

The idea for doing this is not new. There is an excellent series of posts closely tracing an article on applications of functional programming to probability.

A colleague of mine has recently called my attention to his own post of two years ago, where he describes a monad that models stochastically independent events in Clojure. I thought the latter implementation actually went to the heart of the whole idea of monads and that is why, once I started writing my own in F# from scratch, event though I naturally/brute-force-like, chose something described below and which corresponds exactly to the series of posts above, I eventually was compelled to do it my colleague’s way.

Here is the natural choice: a distribution (p.m.f) is just a sequence of events. Each event is simply a record/tuple/map of two values: the signifier of the event and its probability.

//underlying type type 'a Outcome = { Value: 'a Probability : BigRational } //Monadic type type 'a Distribution = 'a Outcome seq

This is great, and it works very well, as the posts show. My slight dissatisfaction with this approach is due to the fact that the competing one appears much more transparent, extremely easy to read, easy to use. In fact, anyone who knows Clojure syntax can use it right away to model independent processes simply by writing them out in Clojure, AND there is no need for any helper functions to extract the results. It just works (from the post, with some minor corrections):

(domonad probability-m [die1 (uniform [1 2 3 4 5 6]) die2 (uniform [1 2 3 4 5 6])] (+ die1 die2))

This code models an experiment of rolling two dice and returns a probability mass function that describes the experiment. All we had to do as users was describe the experiment in the most natural way possible: what is the probability of getting an exact value as the sum of both values of 2 rolled dice. Hence the expression: extract the value from the first pmf, add it to the value extracted from the second. Don’t even need to think about probabilities, as the magic happens behind the monadic scene.

The code above gives us the result: {2 1/36, 3 1/18, 4 1/12, 5 1/9, 6 5/36, 7 1/6, 8 5/36, 9 1/9, 10 1/12, 11 1/18, 12 1/36} – which is a map of the sum of dice values to the probability of getting them. In F#, I believe, the Algol-like syntax actually works to our advantage (I call the computation expression “independent” for obvious reasons):

independent { let! x = uniform [1..6] let! y = uniform [1..6] return x + y }

When executed in F# interactive, and using the FSharp.PowerPack:

val mp : Map<int,BigRational> = map [(2, 1/36N); (3, 1/18N); (4, 1/12N); (5, 1/9N); (6, 5/36N); (7, 1/6N); (8, 5/36N); (9, 1/9N); (10, 1/12N); ...]

We, amazingly enough, get the same answers. Next time: better examples and nuts & bolts of the code.

## Exploring Monadic Landscape: Sql Command Computation Expression

Most of the developers have dealt with calling SQL server stored procedures from their applications at least once or twice. In my last project, where intense data mining is done on the SQL side, this is basically all I am doing. There is always a desire to wrap and abstract the ever-repetitive code to get the connection, build an instance of the SqlCommand class, read in the returned dataset. And it is never coming out quite as succinct as expected.

Again, this is a perfect situation for using computation expressions, as we can clearly see the workflow:

- Connect to the database
- Set command text
- Set command parameters (if necessary)
- Set other command options
- Execute the command of a necessary type

So at this point, it is easy to figure out how to write the builder for the command-oriented workflow.

## Defining the Monadic Type

The gist of this workflow is that we take an instance of SqlCommand and run with it every step of our workflow. Hence, the step is defined like this:

type CmdSqlMonad<'a> = SqlCommand -> 'a let sqlMonad<'a> (f : SqlCommand -> 'a) : CmdSqlMonad<'a> = f

(the operator on line 2 is defined for convenience and to guide the type system).

We can also define some auxiliary methods:

type sqlParams = (string * obj) [] let setParameters (sqlParameters : sqlParams) = sqlMonad(fun (cmd : SqlCommand) -> sqlParameters |> Seq.iter(fun (name, value) -> cmd.Parameters.AddWithValue(name, value) |> ignore)) let setType (tp : CommandType) = sqlMonad (fun cmd -> cmd.CommandType cmd.ExecuteReader()) let execNonQuery() = sqlMonad(fun cmd -> cmd.ExecuteNonQuery()) let execScalar() = sqlMonad (fun cmd -> cmd.ExecuteScalar()) let setTimeout(sec) = sqlMonad(fun cmd -> cmd.CommandTimeout

Each of these (except for the last three) are of the type *CmdSqlMonad<unit>*, as they simply set some properties on our SqlCommand object. This object is propagated all the way through the workflow by our Bind() function:

member this.Bind(c : CmdSqlMonad<'a>, f : 'a -> CmdSqlMonad<'b>) = sqlMonad(fun cmd -> let value = c cmd f value cmd)

We can start defining the builder now. This builder is parameterized. It takes the connection string and the command name (or any query for that matter):

type CmdSqlBuilder (connectionString, command) = do if String.IsNullOrWhiteSpace(connectionString) then invalidArg "connectionString" "connection string must be supplied" let connection = new SqlConnection(connectionString) let cmd = new SqlCommand(name, connection) do (retry { return connection.Open() }) defaultRetryParams let dispose () = cmd.Dispose() interface IDisposable with member this.Dispose () = dispose() GC.SuppressFinalize(this) override this.Finalize() = dispose()

(Note the use of “retry” computation expression).

The rest of the stuff is pretty standard:

member this.Return ( x : 'a) : CmdSqlMonad<'a> = fun cmd -> x member this.Run( m : CmdSqlMonad<'a>) = m cmd member this.Delay(f : unit -> CmdSqlMonad<'a>) = f() member this.ReturnFrom(m : CmdSqlMonad<'a>) = m

We define the Run method to execute the workflow right away with the command that is created in the constructor.

Finally, to define the computation expression:

let sqlCommand(connectionString, name) = new CmdSqlBuilder(connectionString, name)

At this point, wrapping sprocs is easy:

let args : sqlParams = [|("@param1", val1 :> obj); ("@param2", val2 :> obj)|] sqlCommand (connectionString, name) { do! setParameters(args) do! setTimeout(10 * 60) do! setType(CommandType.StoredProcedure) return! execNonQuery() }

Or calling a function:

let args : sqlParams = [|("@param", value :> obj)|] sqlCommand(connectionString, "select dbo.MyFunc(@param)") { do! setParameters [|("@param", searchString :> obj)|] return! execScalar() } :?> string

Or even a simple query:

let rd = sqlCommand(connectionString, "select * from someTable") { return! execReader() }

The code is concise and easy to understand.

Here is the complete source:

module CommandBuilder = open System open System.Data.SqlClient open System.Data type sqlParams = (string * obj) [] type CmdSqlMonad<'a> = SqlCommand -> 'a let sqlMonad<'a> (f : SqlCommand -> 'a) : CmdSqlMonad<'a> = f let setParameters (sqlParameters : sqlParams) = sqlMonad(fun (cmd : SqlCommand) -> sqlParameters |> Seq.iter(fun (name, value) -> cmd.Parameters.AddWithValue(name, value) |> ignore)) let setType (tp : CommandType) = sqlMonad (fun cmd -> cmd.CommandType <- tp) let execReader () = sqlMonad(fun cmd -> cmd.ExecuteReader()) let execNonQuery() = sqlMonad(fun cmd -> cmd.ExecuteNonQuery()) let execScalar() = sqlMonad (fun cmd -> cmd.ExecuteScalar()) let command(text) = sqlMonad(fun cmd -> cmd.CommandText <- text) let setTimeout(sec) = sqlMonad(fun cmd -> cmd.CommandTimeout <- sec) type CmdSqlBuilder (connectionString, name) = do if String.IsNullOrWhiteSpace(connectionString) then invalidArg "connectionString" "connection string must be supplied" let connection = new SqlConnection(connectionString) let cmd = new SqlCommand(name, connection) do cmd.CommandTimeout <- 60 * 20 (retry { return connection.Open() }) defaultRetryParams let dispose () = cmd.Dispose() interface IDisposable with member this.Dispose () = dispose() GC.SuppressFinalize(this) override this.Finalize() = dispose() member this.Command = cmd member this.Return ( x : 'a) : CmdSqlMonad<'a> = fun cmd -> x member this.Run( m : CmdSqlMonad<'a>) = m cmd member this.Delay(f : unit -> CmdSqlMonad<'a>) = f() member this.ReturnFrom(m : CmdSqlMonad<'a>) = m member this.Bind(c : CmdSqlMonad<'a>, f : 'a -> CmdSqlMonad<'b>) = sqlMonad(fun cmd -> let value = c cmd f value cmd) let sqlCommand(connection, name) = new CmdSqlBuilder(connection, name)

## The Push Monad: Introduction

Chapter 5 of Friendly F# has a great practical explanation of F# computation expressions often called “monads” from their use in computer science and Haskell. The material in Chapter 5 of the book does a lot to demystify the concept, theoretical coverage of which is done well in this Wikipedia article.

Monads are an example of things best grasped by actually doing. So I set out to implement one in my project.

What Friendly F# discussion instills above all (confirmed by the Wiki article) is that a monad in functional programming is a great way to subsume some common side effects or patterns under an explicit syntax that serves several purposes:

- Unclutter the code
- Make the pattern visible thus improving readability, while at the same time
- Avoiding “action at a distance” anti-pattern, where things seem to happen magically but it is extremely hard to figure out what is actually responsible for the magic

In this particular case, the monadic pattern is implied by the Push language: all programs are 100% robust, i.e. all syntactically correct programs execute without throwing an exception and the state of the system is preserved. This means that every time something occurs that makes execution of an operation impossible, we need to “unwind” the system and return it to its state before execution had started. It would be nice to factor all of that out of the implementation so we can concentrate exclusively on semantics of the operations.

So, while implementing Push operations the following must be done:

- See if there are enough arguments on stack(s). If there were less than enough exit.
- Start executing the operation. If the operation cannot be completed return everything back to the stack(s), exit. Else:
- Push result to the appropriate stack.

For instance, here is an implementation of one of Push operations written without the use of monads:

[<PushOperation("%")>] static member Mod() = match processArgs2 Float.Me.MyType with | [a1; a2] -> if a2.Raw<float>() = 0. then pushResult a1 pushResult a2 else let quot = Math.Floor(Math.Floor(a1.Raw<float>()) / Math.Floor(a2.Raw<float>())) let res = a1.Raw<float>() - quot * a2.Raw<float>() pushResult(Float(res)) | _ -> ()

Here all the steps are recognizable:

- Pop two arguments from the FLOAT stack using processArgs2. If it returns anything but a list of two values exit.
- Check if the second argument is 0. If so, return arguments back to the stack and exit, otherwise execute the operation.
- Push the result back to the FLOAT stack

Here is the monadic version:

[<PushOperation("%")>] static member Mod() = let getMod stack = push { let! right = popOne stack let! left = popOne<float> stack if right <> 0. then let quot = Math.Floor(Math.Floor(left) / Math.Floor(right)) return! result stack (left - quot * right) } getMod Float.Me.MyType

We no longer need to explicitly handle the pattern mentioned above. All the steps and branches are contained within our definition of the “push” monad, so no magic here. The reader of the code knows where to look for explanation of the side effects.

If there are less than 2 values on top of the FLOAT stack, execution will not go forward and previous arguments will be returned to the stack.

If the right argument is 0, “unwinding” of the state will also happen automatically without any need to handle this case explicitly.

One other convenience: we can now factor out extracting the value from an object we get from the top of a stack (by calling its Raw<‘a>() function). This is done by implementing the monad and presented through compiler sugar of “let!” assignment. A great improvement on maintainability and ease of implementation.

“Under the hood” details to be discussed in the next post.

## Push Operation. A Few Words on Computation Expressions/Monads

As we have seen in the previous post, it is easy to implement Push types and operations:

[<PushType("FLOAT")>] type Float = inherit PushTypeBase new () = {inherit PushTypeBase ()} new (f : float) = {inherit PushTypeBase(f)} static member Me = Float() [<PushOperation("/")>] static member Divide() = match processArgs2 Float.Me.MyType with | [a1; a2] -> if a2.Raw<float>() = 0. then pushResult a1 pushResult a2 else pushResult(Float(a1.Raw<float>() / a2.Raw<float>())) | _ -> ()

Here is an example of an implementation of `FLOAT./`

operation: a division of two integers on top of the stack.

Since in Push an operation is denoted by “OP_TYPE.OP_NAME”, it is convenient from the implementation standpoint to group operations that are unique to a specific type under the definition of this type. Operations are declared *static* and that declaration is enforced by the system.

Methods, implementing operations take no arguments, since all of the information is implied in implementation: we know exactly the type of stack we are working with, as well as how to obtain arguments from it.

### Minimizing Code Duplication. Generic Operations.

Some push operations are “generic” in a sense, that each type implements one. E.g.: DUP, YANK, SWAP, etc manipulate stacks of their type (i.e. INTEGER.DUP pushes a duplicate of the top of the INTEGER stack).

In order to minimize code duplication, it is possible to define “Generic” Push types. These types do not have names and no stack is created for them. They are simply “bags of operations”.

Generic stock operations are implemented as follows:

[<GenericPushType>] type Ops ()= [<GenericPushOperation("<", AppliesTo=[|"INTEGER"; "FLOAT"|])>] static member Lt tp = dualOp (<) tp Bool.Me.MyType [<GenericPushOperation(">", AppliesTo=[|"INTEGER"; "FLOAT"|])>] static member Gt tp = dualOp (>) tp Bool.Me.MyType [<GenericPushOperation("=")>] static member Eq tp = dualOp (=) tp Bool.Me.MyType [<GenericPushOperation("DUP", Description = "Pushes the duplicate of the top of the stack")>] static member dup tp = if stockTypes.Stacks.[tp].length = 0 then () else stockTypes.pushResult (peek stockTypes.Stacks.[tp]) [<GenericPushOperation("POP", Description = "Pops the top of the stack")>] static member pop tp = processArgs1 tp |> ignore [<GenericPushOperation("ROT", Description = "Rotates 3 top stack entries")>] static member rot tp = pushResult (new Integer(2L)) Ops.yank tp

The type that bags these operations is decorated with *GenericPushType *attribute. No name is specified, since this is not a “type” at all, just a bag of operations. Each operation is decorated with *GenericPushOperation* attribute, that besides the name and the description of the operation can have an *AppliesTo* argument that filters out the types to which the operation applies. So, for example operations “>”, “<", "=" apply to numeric types only.

The signature of a method that implements a generic operation is slightly different than that of a regular operation. There is a string argument denoting the type of the operation. This is used to determine on which stack to perform the operation. The type argument can be any of the current types: "INTEGER", "FLOAT", etc.

### A Word about Monads

All Push operations have a very similar structure. They do the following:

- Pop one or two arguments from the stack. If the stack has fewer arguments – quit.
- Present the top of the stack as the right-hand argument and the next value on the stack as the left-hand one.
- Make sure the conditions for performing the operation are met. E.g., if the operation is “/”, top of the stack is not 0. If they are not met, push the arguments back and return.
- Execute the operation.
- Push the result onto the appropriate stack.

The devide operation on the FLOAT type goes through exactly these steps:

[<PushOperation("/")>] static member Divide() = match processArgs2 Float.Me.MyType with | [a1; a2] -> if a2.Raw<float>() = 0. then pushResult a1 pushResult a2 else pushResult(Float(a1.Raw<float>() / a2.Raw<float>())) | _ -> ()

The following helper functions are almost ubiquitously used.

*processArgs2 (processArg1)* – pops 2 (1) arguments from the top of the stack and presents them in the right order. If it returns anything but a list of two objects of the right type, the operation does nothing.

*pushResult* – pushes result of the operation to the appropriate stack. Or in this case returns arguments to the stack, if one of them is 0.

Basically, code for all operations looks very similar. Which leads us to think that computation expressions (aka “Monads”) are a good fit for implementing operations. Monads allow for an easy way to factor out the common “side effects” of the code being executed, and just let us concentrate on functionality. So, for example, the computation expression-based implementation of the division operation above looks like this:

[<PushOperation("/")>] static member Divide() = let divide stack = push { let! right = popOne stack let! left = popOne stack if right <> 0. then return! (result stack (left / right)) } divide Float.Me.MyType

Note:

- in this case we do not need to special-case the fact that arguments are returned to the stack if the operation cannot be performed. We were able to include that functionality into the implementation of the “push” computation expression.
- No need to call Raw() to extract values. This is also encoded in the computation expression.
- No special provision is made for the case when there are fewer than two arguments. Again, the computation expression takes care of that.
- The code is easier to read. All of the “common” functionality is factored out of the “push{}”.

Monadic implementation will be discussed in detail in the future posts.

### Implementing a Generic Op in C#

For testing, I have implemented a TOSTRING operation for INTEGER type as a generic operation. This operation represents the top of the INTEGER stack as string literal and pushes the result to the LITERAL stack:

[GenericPushType] public class IntegerOpExtension { [GenericPushOperation("TOSTRING", Description="Converts top integer to a literal", AppliesTo= new string [] {"INTEGER"})] public static void ConvertToString() { if(TypeFactory.isEmptyStack("INTEGER")) { return; } long val = TypeFactory.processArgs1("INTEGER").Raw<long>(); TypeFactory.pushResult (new Literal(val.ToString())); } }