This blog chronicles my experiences learning F# via implementing Push: a programming language designed for use in evolving populations of programs geared for solving symbolic regression problems (genetic programming).

F# vs C#. Fold and Aggregate

May 13, 2013 Leave a comment

Suppose you need to write a script that finds n files, all called based on some pattern, say “c:\temp\my_file_x.txt”, where “x” is replaced by a range of numbers [1..30] for instance, reads the content of these files and glues them together. Suppose also that the files are very small, so you can keep them in memory all at once. Also, it should be solved in one line (except for auxilaires: defining variables, writing out the results).

One-line solutions exist both in F# and C#. Which one is prettier? I vote for F#.

Here is the C# code:


string templ = @"C:\temp\my_file_";

var content =
 Enumerable.Range(1, 30)
     .Aggregate(
        new List<string>(),
        (a, e) =>
           {
               a.AddRange(File.ReadAllLines(templ + e.ToString() + ".txt"));
               return a;
            });

File.WriteAllLines(templ + ".txt", content);

And here is the F# version (of just the relevant part):

let content =
  [1..30]
  |> List.fold (
    fun content i -> 
      content @ 
      (File.ReadAllLines(fun i -> templ + i.ToString() + ".txt") |> Array.toList)
    ) []

You can accomplish almost anything with fold() and its C# Linq equivalent Aggregate().
So first we create a range, (1..30) (note here, that although [1..30] and Enumerable.Range(1, 30) generate sequences of numbers from 1 to 30, their semantics are different, so [0..30] and Enumerable.Range(0, 30) generate different sequences: the latter generates a sequence of numbers 0..29).

Then we fold the range of numbers into a list of lines (we could have just kept appending the text, not lines, but it is not all that important for this macro, and we want to make sure we start each new addition from a new line), by reading the files and gluing the results together

Categories: C#, F#, LINQ Tags: ,

Data and Visualization.

April 22, 2013 Leave a comment

As the three of us embarked on this new data-mining project, we were the data scientist, the manager and the developer, who knew nothing about visualizations. We didn’t even want to do any visuals at first.

Then someone stumbled across the New York Times Obama Budget visual and the wheels started spinning. Pretty soon we had something like this of our own, and then it snowballed into a real project with quite a few interactive charts and visuals, all d3 based.

While developing all this, I started to wonder: why are the right visuals so incredibly effective in presenting data? Exactly what do the bubbles have that the tables don’t: it is the same data after all. I called upon phenomenology as it was first presented in Logical Investigations by Edmund Husserl’s  (because I haven’t made it any further in husserlian literature yet) to help me understand what is happening.

Husserl and Data Intuition

husserlThe core idea of Logical Investigations is that meanings in the broadest sense of the word (either what I “mean” when I express a thought, or simply say: “This is blue”, “His name is Neal”), exist as a class in itself. Not quite like entities in the platonic heaven of Ideas, but they are a class of some kind of entities, “logical entities” to be exact, in a sense that, just like logical constructs they exist independently of human perception or imagination of any kind.

This seems rather far-fetched at first, after all, through the entire history of philosophy we seem to have always started from sensory perception as the stepping stone towards
When in a presentation I write: “Should yellow patent classes intersect with the green ones?” a person out of context with my project, one without knowledge of patent taxonomy of any kind, can nevertheless have a basic grasp of what I mean: obviously I have somehow separated groups of patents into larger groups. assigned colors to them and now I want to know something about the properties of these groups. Again, the meaning does not seem to depend on perception or experience at all. In fact, most of the 1000 pages of Logical Investigations is spent combating those views. It is not as incredible as it sounds, though. Surely when I say “Paris is beautiful” or “Bed bugs are something you should never experience”, my listener, if she understands the English language, understands what I mean, even if she has never been to Paris, or, God forbid, been bitten by bed bugs. (In fact, when I had my first and I hope only encounter with them, it took me very little time to realize what is going on, even though nobody had warned me and I had never been bitten before that time).

According to Husserl our grasp of meaning is an act that has nothing to do with generating the meaning itself, and occurs when we direct ourselves towards the meaning. The word he uses is “intendieren”, to intend. Expression or understanding, are “intentional” acts in a sense that out of the entire universe of meanings we direct ourselves (“intend”) to a particular one (or a particular cluster) and bring it into focus.

The question still remains: what is the role of perception, or even imagination in all of this? After all, we do seem to think in pictures of sorts, and there is no denial: I understand “Paris is beautiful” or “Bed bugs suck” on a very different level if I have been to Paris or had a misfortune to sleep in the wrong bed.

So, Husserl distinguishes two classes of acts: signicativen (or signitiven ) and intuitiven (erfüllenden). Signifying and  intuitive (fulfilling). Signifying are all the acts where meaning is simply expressed, and intuitive are the acts where perception or imagination is used to “fill” the meaning with some content. When I say “Paris is beautiful”, or “This tree is green”, or “This is Neal”, my expressions are purely signitive, i.e. they just point in the direction of the meanings, “signify” them (from the root “sign”). If I show pictures of Paris (or rely on your imagination to picture Paris), point out of the window at the tree, introduce Neal, – I am now “filling” these pure meanings with intuitive content. Now what I mean actually takes shape. I don’t gain any more understanding, what I gain is insight: internal-sight.

The distinction is important. While all of meaning is expressed in signifying acts, it does not come to a full grasp, until it is intuited, seen in the mind’s eye.

I think these concepts are illustrated par excellence in the field of data visualization. In Husserl’s terminology we may have called it “data intuition”, or “data fulfillment”, or even “data insight”. There is enough meaning in the data itself, especially once data scientists go to work on it and extract trends, make predictions, etc. However, there is no “intuition” in all that. And without this intuition, it so happens, you cannot have a meaningful conversation with your user who may be a layman in the area of statistics, machine learning, data mining: your ideas are empty. You need to “fill” them with pictures. Moving and interactive pictures – better still.

And so we arrive at the definition of “data visualization” (according to Kant it is lucky in philosophical discourse to ever arrive at a definition, in a blog entry it must be nearly impossible):

Data visualization is an act of creating/perceiving presentations of certain aspects signified by data in an intuitive way.

Visualizing Crime with d3: Intro

April 18, 2013 Leave a comment

Figure a blog without pictures or conversations is just boring, so, here it is.

Robbery in Cali

Robbery in Cali

Lately, I have been dealing a lot with data visualization. This was a brand new area for me and while we do use F# for data extraction, all of the front end is done using d3, an amazing toolkit by Mike Bostock.

First of all, I owe the fact that my projects got off the ground to Mike and Jim Vallandigham.  Jim taught me all I know about how to draw bubble “charts” and use d3 force layouts. His blog is invaluable for anyone making first steps in the area of data visualization. Code snippets I am going to post here are due to Jim’s and Mike’s generosity. So, thank you Mike and Jim.

One may ask, if there are already tutorials on how to put together visuals, why assault the world with more musings (as a Candace Bushnell character once wrote). The answer is: my goal in these posts is not to exploring creation of visuals, but rather sharing experiences on how to put together projects that involve visualizations.

These are very different problems, since your task is not just to create a single document or web page for a single purpose, but to create something that can dynamically build these documents or pages, and maybe, within each such document provide different views of the same data. Questions of:

  • design
  • reuse
  • coding practices

come up right away, not to mention general problems:

  • What are data visualizations?
  • What are they used for?
  • Are they needed at all?

So, for these posts, we will be building a project that visualizes crime statistics in the US for the year 2008. The data source for this is found here and the complete solution will look like this.

The approximate plan for the next few posts:

  • Thinking about visualizations and what they are
  • Preparations
    • Getting Data (retrieving, massaging, formatting)
    • Getting the tools together (CofeeScript, d3, ColorBrewer, Knockout.js, Twitter Bootstrap, jQuery, jQuery bbq)
  • Building the visuals
    • Laying out “single” charts
    • Laying out multiple charts on the same page
    • A word about maps
  • Lessons learned: architecting for reuse, etc

Disposable Objects with Computation Expressions

February 2, 2013 Leave a comment

The last post contains the description of a sqlMonad. It also happens to contain a silly and obvious (aren’t they all in hindsight) bug. The bug is in implementing the containing CmdSqlBuilder with the IDisposable.

While the intent was good (the class wraps resources that should be promptly disposed of – SqlCommand and SqlConnection):

        let connection = new SqlConnection(connectionString)
        let cmd = new SqlCommand(name, connection)

there is no real opportunity to use it in this way, since the underlying object is statically created in advance and so cannot be used as disposable objects normally are!

The fix is to clean thing up after each run like so:

        member this.Run( m : CmdSqlMonad<'a>) =
            try
                m cmd
            finally
                dispose()

Here, after each run, dispose() function should do its work closing connections and disposing of the command object. Disposable pattern should not be implemented as its application in this case makes no sense.

Categories: F#

Exploring Monadic Landscape: Sql Command Computation Expression

April 10, 2012 1 comment

Most of the developers have dealt with calling SQL server stored procedures from their applications at least once or twice. In my last project, where intense data mining is done on the SQL side, this is basically all I am doing. There is always a desire to wrap and abstract the ever-repetitive code to get the connection, build an instance of the SqlCommand class, read in the returned dataset. And it is never coming out quite as succinct as expected.

Again, this is a perfect situation for using computation expressions, as we can clearly see the workflow:

  1. Connect to the database
  2. Set command text
  3. Set command parameters (if necessary)
  4. Set other command options
  5. Execute the command of a necessary type

So at this point, it is easy to figure out how to write the builder for the command-oriented workflow.

Defining the Monadic Type

The gist of this workflow is that we take an instance of SqlCommand and run with it every step of our workflow. Hence, the step is defined like this:

    type CmdSqlMonad<'a> = SqlCommand -> 'a
    let sqlMonad<'a> (f : SqlCommand -> 'a) : CmdSqlMonad<'a> = f

(the operator on line 2 is defined for convenience and to guide the type system).

We can also define some auxiliary methods:

    type sqlParams = (string * obj) []

    let setParameters (sqlParameters : sqlParams) =
        sqlMonad(fun (cmd : SqlCommand) -> sqlParameters |> Seq.iter(fun (name, value) -> cmd.Parameters.AddWithValue(name, value) |> ignore))

    let setType (tp : CommandType) = sqlMonad (fun cmd -> cmd.CommandType  cmd.ExecuteReader())

    let execNonQuery() =
        sqlMonad(fun cmd ->  cmd.ExecuteNonQuery())

    let execScalar() =
        sqlMonad (fun cmd -> cmd.ExecuteScalar())

    let setTimeout(sec) = sqlMonad(fun cmd -> cmd.CommandTimeout

Each of these (except for the last three) are of the type CmdSqlMonad<unit>, as they simply set some properties on our SqlCommand object. This object is propagated all the way through the workflow by our Bind() function:

        member this.Bind(c : CmdSqlMonad<'a>, f : 'a -> CmdSqlMonad<'b>) =
            sqlMonad(fun cmd ->
                let value = c cmd
                f value cmd)

We can start defining the builder now. This builder is parameterized. It takes the connection string and the command name (or any query for that matter):

    type CmdSqlBuilder (connectionString, command) =
        do
            if String.IsNullOrWhiteSpace(connectionString) then invalidArg "connectionString" "connection string must be supplied"

        let connection = new SqlConnection(connectionString)
        let cmd = new SqlCommand(name, connection)

        do
            (retry {
                return connection.Open()
            }) defaultRetryParams

        let dispose () =
            cmd.Dispose()

        interface IDisposable with
            member this.Dispose () =
                dispose()
                GC.SuppressFinalize(this)

        override this.Finalize() = dispose()

(Note the use of “retry” computation expression).

The rest of the stuff is pretty standard:

        member this.Return ( x : 'a) : CmdSqlMonad<'a> = fun cmd -> x
        member this.Run( m : CmdSqlMonad<'a>) = m cmd
        member this.Delay(f : unit -> CmdSqlMonad<'a>) = f()
        member this.ReturnFrom(m : CmdSqlMonad<'a>) = m

We define the Run method to execute the workflow right away with the command that is created in the constructor.

Finally, to define the computation expression:

let sqlCommand(connectionString, name)  = new CmdSqlBuilder(connectionString, name)

At this point, wrapping sprocs is easy:

        let args : sqlParams = [|("@param1", val1 :> obj);  ("@param2", val2 :> obj)|]
        
        sqlCommand (connectionString, name) {
            do! setParameters(args)
            do! setTimeout(10 * 60)
            do! setType(CommandType.StoredProcedure)
            return! execNonQuery()
        }

Or calling a function:

        let args : sqlParams = [|("@param", value :> obj)|]
        
         sqlCommand(connectionString, "select dbo.MyFunc(@param)") {
             do! setParameters [|("@param", searchString :> obj)|]
             return! execScalar()
         } :? > string

Or even a simple query:

        let rd = 
            sqlCommand(connectionString, "select * from someTable") {
                return! execReader()
            }

The code is concise and easy to understand.
Here is the complete source:

module CommandBuilder =

    open System
    open System.Data.SqlClient
    open System.Data

    type sqlParams = (string * obj) []

    type CmdSqlMonad<'a> = SqlCommand -> 'a

    let sqlMonad<'a> (f : SqlCommand -> 'a) : CmdSqlMonad<'a> = f
    
    let setParameters (sqlParameters : sqlParams) =
        sqlMonad(fun (cmd : SqlCommand) -> sqlParameters |> Seq.iter(fun (name, value) -> cmd.Parameters.AddWithValue(name, value) |> ignore))

    let setType (tp : CommandType) = sqlMonad (fun cmd -> cmd.CommandType <- tp)

    let execReader () = 
        sqlMonad(fun cmd -> cmd.ExecuteReader())

    let execNonQuery() =
        sqlMonad(fun cmd ->  cmd.ExecuteNonQuery())

    let execScalar() =
        sqlMonad (fun cmd -> cmd.ExecuteScalar())

    let command(text) = sqlMonad(fun cmd -> cmd.CommandText <- text)

    let setTimeout(sec) = sqlMonad(fun cmd -> cmd.CommandTimeout <- sec)

    type CmdSqlBuilder (connectionString, name) =
        do
            if String.IsNullOrWhiteSpace(connectionString) then invalidArg "connectionString" "connection string must be supplied"
        
        let connection = new SqlConnection(connectionString)
        let cmd = new SqlCommand(name, connection)

        do 
            cmd.CommandTimeout <- 60 * 20
            (retry {
                return connection.Open()
            }) defaultRetryParams

        let dispose () = 
            cmd.Dispose()

        interface IDisposable with
            member this.Dispose () =
                dispose()
                GC.SuppressFinalize(this)

        override this.Finalize() = dispose()

        member this.Command = cmd
        member this.Return ( x : 'a) : CmdSqlMonad<'a> = fun cmd -> x
        member this.Run( m : CmdSqlMonad<'a>) = m cmd
        member this.Delay(f : unit -> CmdSqlMonad<'a>) = f()
        member this.ReturnFrom(m : CmdSqlMonad<'a>) = m
                     
        member this.Bind(c : CmdSqlMonad<'a>, f : 'a -> CmdSqlMonad<'b>) =
            sqlMonad(fun cmd -> 
                let value = c cmd
                f value cmd)

    let sqlCommand(connection, name)  = new CmdSqlBuilder(connection, name)

Retry Monad: An Implementation

March 17, 2012 Leave a comment

One application that seems quite intuitively to be a good case for “monadization” is that of retrying a function call upon an exception that is thrown while executing it. This may be needed for inherently unreliable operations, dependent on a network connection, for example. Discussions of this can be easily found. Here is one on StackOverflow.

The complete solution is on F# Snippets. Here is the explanation.

Any monad has these components:

  1. Monadic type: M<’a>
  2. Return function that creates a monadic type from a value of type ‘a
  3. Bind function that is capable of linking monads together

So, in this order. We first define our monadic type:

    type RetryMonad<'a> = RetryParams -> 'a
    let rm<'a> (f : RetryParams -> 'a) : RetryMonad<'a> = f

RetryMonad is just a function, that takes RetryParams type and returns a value. How is this helpful? Well, an operation or a sequence of operations of any kind can be easily wrapped into a retry monad:

    let fn1 (x:float) (y:float) = rm (fun rp -> x * y)

I have also defined the “rm” operator for purely decorative purposes: it displays “RetryMonad” whenever it is used instead of RetryParams -> ‘a.

    let fn1 (x:float) (y:float) = rm (fun rp -> x * y)

rm operator gives us a printout of

val fn1 : float -> float -> RetryMonad<float>

in F# interactive. Otherwise it would look:

val fn1 : float -> float -> 'a -> float

RetryParams simply encapsulates our retry configuration: how many retries to perform and how long to wait between them.

    type RetryParams = {
        maxRetries : int; waitBetweenRetries : int
        }

    let defaultRetryParams = {maxRetries = 3; waitBetweenRetries = 1000}

Next we define the builder class where Bind and Return are defined, together with Delay and Run required by F#.

type RetryBuilder () =
   member this.Return (x : 'a) = fun defaultRetryParams -> x
   member this.Run(m : RetryMonad<'a>) = m
   member this.Delay(f : unit -> RetryMonad<'a>) = f ()

Next defining the Bind function is the most interesting part. Bind has a signature:

(RetryMonad * ‘a -> RetryMonad) -> RetryMonad. In other words it knows how to do two things: 1. Extract the actual value from RetryMonad, i.e. execute the underlying function with retries, and 2. pass the result onward to the next monad in the chain.

        member this.Bind (p : RetryMonad<'a>, f : 'a -> RetryMonad<'b>)  =
            rm (fun retryParams -> 
                let value = retryFunc p retryParams //extract the value
                f value retryParams                //... and pass it on
            )

Here retryFunc is doing all the work of executing a function with retries:

    let internal retryFunc<'a> (f : RetryMonad<'a>) =
        rm (fun retryParams -> 
            let rec execWithRetry f i e =
                match i with
                | n when n = retryParams.maxRetries -> raise e
                | _ -> 
                    try
                        f retryParams //actual execution
                    with 
                    | e -> Thread.Sleep(retryParams.waitBetweenRetries); execWithRetry f (i + 1) e
            execWithRetry f 0 (Exception())
            ) 

Notice here that retryParams are available to the function during execution. This opens up possibilities of how the function may tune its actions depending on the retry policy. In this example there is not much to do, but RetryParams may theoretically be a more sophisticated type.

retryFunc throws the exception it gets from the function if there is an unrecoverable failure.
Since execWithRetry is tail recursive, there is no punishment normally associated with recursion (dipping into the stack, etc). In functional languages this is essentially a “while” loop.

This is it! The only thing left to do is to create an instance of the builder:

let retry = RetryBuilder()

An example program is below. Since we are chaining monads of the RetryMonad type, actual functions need to first be “wrapped” in it. See definitions of fn1 and fn2. “rm” in the body of those functions is just for “show”, not strictly necessary.

let Main(args) =
    
    let fn1 (x:float) (y:float) = rm (fun rp -> x * y)
    let fn2 (x:float) (y:float) = rm (fun rp -> if y = 0. then raise (invalidArg "y" "cannot be 0") else x / y)

    try
        let x = 
            (retry {
                let! a = fn1 7. 5.
                let! b = fn1 a 10.
                return b
            }) defaultRetryParams 

        printfn "first retry: %f" x

        let retryParams = {maxRetries = 5; waitBetweenRetries = 100}

        let ym = 
            retry {
                let! a = fn1 7. 5.
                let! b = fn1 a a
                let! c = fn2 b 0.
                return c
            }

        let y = ym retryParams
        0
    with
        e -> Console.WriteLine(e.Message); 1

This is slightly contrived, of course but you can see it in action if you insert some debugging output into the body of the functions, like I did in the complete solution:

Attempt 1
Result: 35. MaxRetries: 3. Wait: 1000.
Attempt 1
Result: 350. MaxRetries: 3. Wait: 1000.
first retry: 350.000000

Attempt 1
Result: 35. MaxRetries: 5. Wait: 100.
Attempt 1
Result: 1225. MaxRetries: 5. Wait: 100.
Attempt 1
Attempt 2
Attempt 3
Attempt 4
Attempt 5
cannot be 0
Parameter name: y

The Push Monad: Introduction

March 16, 2012 Leave a comment

Chapter 5 of Friendly F# has a great practical explanation of F# computation expressions often called “monads” from their use in computer science and Haskell. The material in Chapter 5 of the book does a lot to demystify the concept, theoretical coverage of which is done well in this Wikipedia article.

Monads are an example of things best grasped by actually doing. So I set out to implement one in my project.

What Friendly F# discussion instills above all (confirmed by the Wiki article) is that a monad in functional programming is a great way to subsume some common side effects or patterns under an explicit syntax that serves several purposes:

  • Unclutter the code
  • Make the pattern visible thus improving readability, while at the same time
  • Avoiding “action at a distance” anti-pattern, where things seem to happen magically but it is extremely hard to figure out what is actually responsible for the magic

In this particular case, the monadic pattern is implied by the Push language: all programs are 100% robust, i.e. all syntactically correct programs execute without throwing an exception and the state of the system is preserved. This means that every time something occurs that makes execution of an operation impossible, we need to “unwind” the system and return it to its state before execution had started. It would be nice to factor all of that out of the implementation so we can concentrate exclusively on semantics of the operations.

So, while implementing Push operations the following must be done:

  1. See if there are enough arguments on stack(s). If there were less than enough exit.
  2. Start executing the operation. If the operation cannot be completed return everything back to the stack(s), exit. Else:
  3. Push result to the appropriate stack.

For instance, here is an implementation of one of Push operations written without the use of monads:

        [<PushOperation("%")>]
        static member Mod() =
            match processArgs2 Float.Me.MyType with
            | [a1; a2] -> 
                if a2.Raw<float>() = 0. 
                then 
                    pushResult a1
                    pushResult a2
                else
                    let quot = Math.Floor(Math.Floor(a1.Raw<float>()) / Math.Floor(a2.Raw<float>()))
                    let res =  a1.Raw<float>() - quot * a2.Raw<float>()
                    pushResult(Float(res))
            | _ -> ()

Here all the steps are recognizable:

  1. Pop two arguments from the FLOAT stack using processArgs2. If it returns anything but a list of two values exit.
  2. Check if the second argument is 0. If so, return arguments back to the stack and exit, otherwise execute the operation.
  3. Push the result back to the FLOAT stack

Here is the monadic version:

    [<PushOperation("%")>]
    static member Mod() =
        let getMod stack = 
            push {
                let! right = popOne stack
                let! left = popOne<float> stack
                if right <> 0. then
                    let quot = Math.Floor(Math.Floor(left) / Math.Floor(right))
                    return! result stack (left - quot * right)
            }
        getMod Float.Me.MyType

We no longer need to explicitly handle the pattern mentioned above. All the steps and branches are contained within our definition of the “push” monad, so no magic here. The reader of the code knows where to look for explanation of the side effects.

If there are less than 2 values on top of the FLOAT stack, execution will not go forward and previous arguments will be returned to the stack.

If the right argument is 0, “unwinding” of the state will also happen automatically without any need to handle this case explicitly.

One other convenience: we can now factor out extracting the value from an object we get from the top of a stack (by calling its Raw<’a>() function). This is done by implementing the monad and presented through compiler sugar of “let!” assignment. A great improvement on maintainability and ease of implementation.

“Under the hood” details to be discussed in the next post.

Follow

Get every new post delivered to your Inbox.