Archive

Archive for the ‘data visualization’ Category

Visualizing Graphs

September 18, 2016 1 comment

Previously

Walking the Eule Path: Intro

Generating and Visualizing Graphs

I can hardly overemphasize the importance of visusalizations. Many a bug had been immediately spotted just by looking at a visual of a complex data structure. I therefore decided to add visuals to the project as soon as the DirectedGraph class was born.

Code & Prerequisits

Code is on GitHub.

  1. GraphViz: install and add the bin directory to the PATH
  2. EmguCV v3.1: install and add the bin directory to the PATH

DrawGraph

This is a small auxiliary component I wrote to make all future visualizations possible. And here is a sidebar. I didn’t want to write this component. I am not a fan of re-writing something that was written a hundred times before me, so the first thing I did was look for something similar I could use. Sure enough, I found a few things. How can I put it? Software engineering is great, but boy, do we tend to overengineer things! I know, I’m guilty of the same thing myself. All I wanted from the library was an ability to receive a text file written in GraphViz DSL, and get on the output a .png containing the picture of the graph. Just a very simple GraphViz driver, nothing more.

One library had me instantiate 3 (three!) classes, another developed a whole API of its own to build the GraphViz file… I ended up writing my own component, it has precisely 47 lines of code. the last 4 lines are aliasing a single function that does exactly what I wanted. It creates the png file and then immediately invokes the EmguCV image viewer to show it. After we’re done, it cleans up after itself, deleting the temporary png file. Here it is.

Taking it for a Ride

Just to see this work…
Another digression. Love the new feature that generates all the “#r” instructions for F# scripts and sticks them into one file! Yes, this one! Right-click on “References” in an F# project:

generaterefs.

And the generated scripts auto-update as you recompile with new references! A+ for the feature, thank you so much.

Comes with a small gotcha, though: sometimes it doesn’t get the order of references quite right and then errors complaining of references not being loaded appear in the interactive. I spent quite a few painful hours wondering how is it that this reference was not loaded, when here it is! Then I realized: it was being loaded AFTER it was required by references coming after it).

#load "load-project-release.fsx"
open DrawGraph

createGraph "digraph{a->b; b->c; 2->1; d->b; b->b; a->d}" "dot.exe" None

initial

Cool. Now I can take this and use my own function to generate a graph from a string adjacency list, visualize it, and even view some of its properties. Sort of make the graph “palpable”:

let sparse = ["a -> b, c, d"; "b -> a, c"; "d -> e, f"; "e -> f"; "1 -> 2, 3"; "3 -> 4, 5"; "x -> y, z"; "2 -> 5"]
let grs = StrGraph.FromStrings sparse

grs.Visualize(clusters = true)

clusters

StrGraph.FromStrings does exactly what it says: it generates a graph from a sequence of strings, formatted like the sparse list above.
My Visualize function is a kitchen sink for all kinds of visuals, driven by its parameters. In the above example, it invokes graph partitioning to clearly mark connected components.

It is important to note, that this functionality was added to the visualizer not because I wanted to see connected components more clearly, but as a quick way to ensure that my partitioning implementation was indeed working correctly.

Generating Data and Looking at It

Now we have a class that builds graphs and even lets us look at them, so where do we get these graphs? The easiest thing (seemed at the time) was to create them.

Enter FsCheck. It’s not the easiest library to use, there is a learning curve and getting used to things takes time, but it’s very helpful. Their documentation is quite good too. The idea is to write a generator for your type and then use that generator to create as many samples as you like:

#load "load-project-release.fsx"

open Graphs
open FsCheck
open System
open DataGen

let grGen = graphGen 3 50
let gr = grGen.Sample(15, 5).[2]

gr.Visualize(into=3, out= 3)

This produces something like:

gengraph

My function graphGen len num generates a graph of text vertices where len is the length of a vertex name and num is the number of vertices. It returns an FsCheck generator that can then be sampled to get actual graphs. This was a one-off kind of experiment, so it’s in a completely separate module:


//DataGen.fs

module DataGen
open FsCheck
open System
open Graphs

let nucl = Gen.choose(int 'A', int 'Z') |> Gen.map char

let genVertex len =  Gen.arrayOfLength len nucl |> Gen.map (fun c -> String(c))
let vertices len number = Gen.arrayOfLength number (genVertex len) |> Gen.map Array.distinct

let graphGen len number =
    let verts = vertices len number
    let rnd = Random(int DateTime.UtcNow.Ticks)
    let pickFrom = verts |> Gen.map (fun lst -> lst.[rnd.Next(lst.Length)])
    let pickTo = Gen.sized (fun n -> Gen.listOfLength (if n = 0 then 1 else n) pickFrom)

    Gen.sized
    <| 
    (fun n ->
        Gen.map2 
            (fun from to' -> 
                from, (to' |> Seq.reduce (fun acc v -> acc + ", " + v))) pickFrom pickTo
        |>
        Gen.arrayOfLength (if n = 0 then 1 else n)
        |> Gen.map (Array.distinctBy fst)
        |> Gen.map (fun arr ->  arr |> Array.map (fun (a, b) -> a + " -> " + b))
    )
    |> Gen.map StrGraph.FromStrings

This whole module cascades different FsCheck generators to create a random graph.
The simplest of them nucl, generates a random character. (Its name comes from the fact that originally I wanted to limit the alphabet to just four nucleotide characters A, C, G, T). Then this generator is used by genVertex to generate a random string vertex, and finally vertices creates an array of distinct random vertices.

graphGen creates a sequence of strings that FromStrings (above) understands. It first creates a string of “inbound” vertices and then adds an outbound vertex to each such string.

Sampling is a little tricky, for instance, the first parameter to the Sample function, which, per documentation, controls sample size, in this case is responsible for complexity and connectivity of the resulting graphs.

On to Euler…

The script above also specifies a couple of optional parameters to the visualizer: into will mark any vertex that has into or more inbound connections in green. And out will do the same for outbound connections and yellow. If the same vertex possesses both properties, it turns blue.

Inspired by all this success, I now want to write a function that would generate Eulerian graphs. The famous theorem states that being Eulerian (having an Euler cycle) for a directed graph is equivalent to being strongly connected and having in-degree of each vertex equal to its out-degree. Thus, the above properties of the visualizer are quite helpful in confirming that the brand new generator I have written for Eulerain graphs (GenerateEulerGraph) is at the very least on track:


let gre = StrGraph.GenerateEulerGraph(10, 5)
gre.Visualize(into=3, out=3)

eulerinout

Very encouraging! Whatever has at least 3 edges out, has at least 3 edges in. Not a definitive test, but the necessary condition of having only blue and transparent vertices in the case of an Eulerian graph is satisfied.

In the next post – more about Eulerian graphs, de Brujin sequences, building (and visualizing!) de Bruijn graphs, used for DNA sequence assembly.

D3 Fisheye Distortion for Bar Charts

February 25, 2014 2 comments

Intro

Focus + context visualizations are quite useful when we want to zoom into some part of a visualization, but are unwilling to give up the “bird’s eye” view of the entire picture. In this case, we distort the part of the visual on which we want to focus, while preserving the entire view.

In D3, this task is accomplished through “fisheye distortion”. Mike Bostock has examples of usage of his fisheye plugin on his site.

The problem is this plugin does not support bar charts, where the application could also be quite useful. I believe lack of support is explained by the fact that the ordinal scale rangeBand() function does not take any parameters, which is fine for any bar chart: all you need is to split your range into equal regions. When applying a distortion, however, the chunk of the range devoted to a particular part of the input depends on the position of this particular chunk. So extending the plugin is really trivial, just need to provide a new signature for the rangeBand() function.

Here is what a finished example looks like:

bar_chart

The complete code for it, including the plugin can be found on JSFiddle.

In order to create it, I have modified an existing example by a StackOverflow user (thank you very much!).

Usage

The actual fisheye plugin spans lines 1 – 144 in the jsfiddle cited above. Cut it, save to a file.

  1. Create your ordinal scale for the bar chart using the plugin:
    var x = d3.fisheye.ordinal().rangeRoundBands([0, w - p[1] - p[3]])
    .distortion(0.9);
    
  2. Replace all calls to rangeBand() with calls to rangeBand(d.x):

    
    // Add a rect for each date.
    var rect = cause.selectAll("rect")
    .data(Object)
    .enter().append("svg:rect")
    .attr("x", function(d) { return x(d.x); })
    .attr("y", function(d) { return -y(d.y0) - y(d.y); })
    .attr("height", function(d) { return y(d.y); })
    .attr("width", function(d) {return x.rangeBand(d.x);});
    
    // Add a label per date.
    var label = svg.selectAll("text")
    .data(x.domain())
    .enter().append("svg:text")
    .attr("x", function(d) { return x(d) + x.rangeBand(d.x) / 2; })
    .attr("y", 6)
    .attr("text-anchor", "middle")
    .attr("dy", ".71em")
    .text(format);
    
  3. Add mouse interaction to your main container, to update the focus of the distortion, and redraw the affected elements:

    //respond to the mouse and distort where necessary
    svg.on("mousemove", function() {
        var mouse = d3.mouse(this);
        
        //refocus the distortion
        x.focus(mouse[0]);
        //redraw the bars
        rect
        .attr("x", function(d) { return x(d.x); })
        .attr("width", function(d) {return x.rangeBand(d.x);});
        
        //redraw the text
        label.attr("x", function(d) { return x(d) + x.rangeBand(d.x) / 2; });
    });
    

Et voilà!

Visualizing Crime with d3: Hooking up Data and Colors, Part 2

June 17, 2013 1 comment

In the previous post, we derived a class from BubbleChart and this got us started on actually visualizing some meaningful data using bubbles.

There are a couple of things to iron out before a visual can appear.

Color Schemes

I am using Cynthia Brewer color schemes, available for download in colorbrewer.css. This file is available on my GitHub as well.

It consists of entries like:

.Spectral .q0-3{fill:rgb(252,141,89)}
.Spectral .q1-3{fill:rgb(255,255,191)}
.Spectral .q2-3{fill:rgb(153,213,148)}
.Spectral .q0-4{fill:rgb(215,25,28)}
.Spectral .q1-4{fill:rgb(253,174,97)}
.Spectral .q2-4{fill:rgb(171,221,164)}
.Spectral .q3-4{fill:rgb(43,131,186)}
.Spectral .q0-5{fill:rgb(215,25,28)}
.Spectral .q1-5{fill:rgb(253,174,97)}

Usage is simple: you pick a color scheme and add it to the class of your parent element that will contain the actual SVG elements displayed, e.g.: Spectral. Then, one of the “qin classes are assigned to these child elements to get the actual color.

So for instance:

The main SVG element on the Crime Explorer visual looks like this:

<svg class="Spectral" id="svg_vis">...</svg>

Then, each of the “circle” elements inside this SVG container will have one of the qi-9 (I am using 9 total colors to display this visualization so i ranges from 0..8).

<circle r="9.664713682964603" class="q2-9" stroke-width="2" stroke="#b17943" id="city_0" cx="462.4456905180483" cy="574.327856528298"></circle>

(Note the class=”q2-9″ attribute above).

All of this is supported by the BubbleChart class with some prodding.
You need to:

  1. Pass the color scheme to the constructor of the class derived from BubbleChart upon instantiation:
    allStates = new AllStates('vis', crime_data, 'Spectral')
    
  2. Implement a function called color_class in the derived class, that will produce a string of type “qi-n”, given an i. The default function supplied with the base class always returns “q1-6”.
    @color_class =
          d3.scale.threshold().domain(@domain).range(("q#{i}-9" for i in [8..0]))
    

    In my implementation, I am using the d3 threshold scale to map a domain of values to the colors I need based on certain thresholds. The range is reversed only because I want “blue” colors to come out on lower threshold values, and “red” – on higher ones (less crime is “better”, so I use red for higher values). See AllStates.coffe for a full listing.How this is hooked up tho the actual data is discussed in the next section.

Data Protocol

This is key: data you pass to the BubbleChart class must comply with the following requirements:

  1. It must be an array (not an associative array, a regular array). Each element of this array will be displayed as a circle (“bubble”) on the screen.
  2. Each element must contain the following fields:
    • id – this is a UNIQUE id of the element. It is used by BubbleChart to do joins (see d3 documentation for what these are)
    • value – this is what the “value” of each data element is, and it is used to compute the radius of each bubble
    • group – indicates the “color group” to which the bubble belongs. This is what is fed to the color_class function to determine the color of each individual bubble

With all these conditions satisfied, the array of data is now ready to be displayed.

Displaying It

Now that it is all done, showing the visual is simple:

allStates = new AllStates('vis', crime_data, 'Spectral')
allStates.create_vis()
allStates.display()

Next time: displaying the auxiliary elements: color and size legends, the search box.

Visualizing Crime with d3: How to Make Bubbles and Influence People, Part 1

May 27, 2013 1 comment

Previously:

  1. Visualizing Crime with d3: Intro
  2. Data and Visualization

In order to make a bubble chart in d3 (the one similar to the Obama Budget 2013), using CoffeeScript, you need to:

  1. Download a few files from my git hub (you’ll need coffee/BubbleChartSingle.coffee, css/visuals.css, css/colorbrewer.css)
  2. Define a class in a .coffee file:
    class @MyBubbleChart extends @BubbleChart
       constructor: (id, data, color) ->
          super(id, data, color)
    
  3. I also define a couple of extensions to make life easier (in displayVis.coffee):
     String::startsWith = (str) -> this.slice(0, str.length) == str
     String::removeLeadHash = () -> if this.startsWith("#") then this.slice(1) else this
    
  4. Finally, instantiate and display:
     chart = new MyBubbleChart('vis', myArrayOfData, 'Spectral')
     chart.create_vis()
     chart.display()
    

    Here ‘vis’ is the id of a container on your page where the visualization will go, e.g.:

    <div id='vis'></div>
    

    myArrayOfData – is an array of your data, 'Spectral' – is a color scheme, one of many available from colorbrewer.css, created by Cynthia Brewer. You can read about how this works here. Making colors for the visualization is a science in and of itself, since I am not versed in it, I am using someone else’s wonderful results.

And this it, you are done!

No, of course not, just kidding. There are a few more things to be tweaked in order for this to work. In particular, we need to observe a simple convention around the structure of our data records, define our own color_class function so that the bubbles are colored meaningfully, and set some scaling parameters based on our data so that the circles fit nicely inside the container. It is also a good idea to bring in some tooltips to show when the user hovers over a bubble (or, for that matter, touches it on her tablet).

I will illustrate this with the crime example in the next post (the code is: coffee/AllStates.coffee)

Data and Visualization.

April 22, 2013 1 comment

As the three of us embarked on this new data-mining project, we were the data scientist, the manager and the developer, who knew nothing about visualizations. We didn’t even want to do any visuals at first.

Then someone stumbled across the New York Times Obama Budget visual and the wheels started spinning. Pretty soon we had something like this of our own, and then it snowballed into a real project with quite a few interactive charts and visuals, all d3 based.

While developing all this, I started to wonder: why are the right visuals so incredibly effective in presenting data? Exactly what do the bubbles have that the tables don’t: it is the same data after all. I called upon phenomenology as it was first presented in Logical Investigations by Edmund Husserl’s  (because I haven’t made it any further in husserlian literature yet) to help me understand what is happening.

Husserl and Data Intuition

husserlThe core idea of Logical Investigations is that meanings in the broadest sense of the word (either what I “mean” when I express a thought, or simply say: “This is blue”, “His name is Neal”), exist as a class in itself. Not quite like entities in the platonic heaven of Ideas, but they are a class of some kind of entities, “logical entities” to be exact, in a sense that, just like logical constructs they exist independently of human perception or imagination of any kind.

This seems rather far-fetched at first, after all, through the entire history of philosophy we seem to have always started from sensory perception as the stepping stone towards
When in a presentation I write: “Should yellow patent classes intersect with the green ones?” a person out of context with my project, one without knowledge of patent taxonomy of any kind, can nevertheless have a basic grasp of what I mean: obviously I have somehow separated groups of patents into larger groups. assigned colors to them and now I want to know something about the properties of these groups. Again, the meaning does not seem to depend on perception or experience at all. In fact, most of the 1000 pages of Logical Investigations is spent combating those views. It is not as incredible as it sounds, though. Surely when I say “Paris is beautiful” or “Bed bugs are something you should never experience”, my listener, if she understands the English language, understands what I mean, even if she has never been to Paris, or, God forbid, been bitten by bed bugs. (In fact, when I had my first and I hope only encounter with them, it took me very little time to realize what is going on, even though nobody had warned me and I had never been bitten before that time).

According to Husserl our grasp of meaning is an act that has nothing to do with generating the meaning itself, and occurs when we direct ourselves towards the meaning. The word he uses is “intendieren”, to intend. Expression or understanding, are “intentional” acts in a sense that out of the entire universe of meanings we direct ourselves (“intend”) to a particular one (or a particular cluster) and bring it into focus.

The question still remains: what is the role of perception, or even imagination in all of this? After all, we do seem to think in pictures of sorts, and there is no denial: I understand “Paris is beautiful” or “Bed bugs suck” on a very different level if I have been to Paris or had a misfortune to sleep in the wrong bed.

So, Husserl distinguishes two classes of acts: signicativen (or signitiven ) and intuitiven (erfüllenden). Signifying and  intuitive (fulfilling). Signifying are all the acts where meaning is simply expressed, and intuitive are the acts where perception or imagination is used to “fill” the meaning with some content. When I say “Paris is beautiful”, or “This tree is green”, or “This is Neal”, my expressions are purely signitive, i.e. they just point in the direction of the meanings, “signify” them (from the root “sign”). If I show pictures of Paris (or rely on your imagination to picture Paris), point out of the window at the tree, introduce Neal, – I am now “filling” these pure meanings with intuitive content. Now what I mean actually takes shape. I don’t gain any more understanding, what I gain is insight: internal-sight.

The distinction is important. While all of meaning is expressed in signifying acts, it does not come to a full grasp, until it is intuited, seen in the mind’s eye.

I think these concepts are illustrated par excellence in the field of data visualization. In Husserl’s terminology we may have called it “data intuition”, or “data fulfillment”, or even “data insight”. There is enough meaning in the data itself, especially once data scientists go to work on it and extract trends, make predictions, etc. However, there is no “intuition” in all that. And without this intuition, it so happens, you cannot have a meaningful conversation with your user who may be a layman in the area of statistics, machine learning, data mining: your ideas are empty. You need to “fill” them with pictures. Moving and interactive pictures – better still.

And so we arrive at the definition of “data visualization” (according to Kant it is lucky in philosophical discourse to ever arrive at a definition, in a blog entry it must be nearly impossible):

Data visualization is an act of creating/perceiving presentations of certain aspects signified by data in an intuitive way.

Visualizing Crime with d3: Intro

April 18, 2013 1 comment

Figure a blog without pictures or conversations is just boring, so, here it is.

Robbery in Cali

Robbery in Cali

Lately, I have been dealing a lot with data visualization. This was a brand new area for me and while we do use F# for data extraction, all of the front end is done using d3, an amazing toolkit by Mike Bostock.

First of all, I owe the fact that my projects got off the ground to Mike and Jim Vallandigham.  Jim taught me all I know about how to draw bubble “charts” and use d3 force layouts. His blog is invaluable for anyone making first steps in the area of data visualization. Code snippets I am going to post here are due to Jim’s and Mike’s generosity. So, thank you Mike and Jim.

One may ask, if there are already tutorials on how to put together visuals, why assault the world with more musings (as a Candace Bushnell character once wrote). The answer is: my goal in these posts is not to exploring creation of visuals, but rather sharing experiences on how to put together projects that involve visualizations.

These are very different problems, since your task is not just to create a single document or web page for a single purpose, but to create something that can dynamically build these documents or pages, and maybe, within each such document provide different views of the same data. Questions of:

  • design
  • reuse
  • coding practices

come up right away, not to mention general problems:

  • What are data visualizations?
  • What are they used for?
  • Are they needed at all?

So, for these posts, we will be building a project that visualizes crime statistics in the US for the year 2008. The data source for this is found here and the complete solution will look like this.

The approximate plan for the next few posts:

  • Thinking about visualizations and what they are
  • Preparations
    • Getting Data (retrieving, massaging, formatting)
    • Getting the tools together (CofeeScript, d3, ColorBrewer, Knockout.js, Twitter Bootstrap, jQuery, jQuery bbq)
  • Building the visuals
    • Laying out “single” charts
    • Laying out multiple charts on the same page
    • A word about maps
  • Lessons learned: architecting for reuse, etc