F# vs C#. Fold and Aggregate

Suppose you need to write a script that finds n files, all called based on some pattern, say “c:\temp\my_file_x.txt”, where “x” is replaced by a range of numbers [1..30] for instance, reads the content of these files and glues them together. Suppose also that the files are very small, so you can keep them in memory all at once. Also, it should be solved in one line (except for auxilaires: defining variables, writing out the results).

One-line solutions exist both in F# and C#. Which one is prettier? I vote for F#.

Here is the C# code:

string templ = @"C:\temp\my_file_";

var content =
 Enumerable.Range(1, 30)
        new List<string>(),
        (a, e) =>
               a.AddRange(File.ReadAllLines(templ + e.ToString() + ".txt"));
               return a;

File.WriteAllLines(templ + ".txt", content);

And here is the F# version (of just the relevant part):

let content =
  |> List.fold (
    fun content i -> 
      content @ 
      (File.ReadAllLines(fun i -> templ + i.ToString() + ".txt") |> Array.toList)
    ) []

You can accomplish almost anything with fold() and its C# Linq equivalent Aggregate().
So first we create a range, (1..30) (note here, that although [1..30] and Enumerable.Range(1, 30) generate sequences of numbers from 1 to 30, their semantics are different, so [0..30] and Enumerable.Range(0, 30) generate different sequences: the latter generates a sequence of numbers 0..29).

Then we fold the range of numbers into a list of lines (we could have just kept appending the text, not lines, but it is not all that important for this macro, and we want to make sure we start each new addition from a new line), by reading the files and gluing the results together

2 thoughts on “F# vs C#. Fold and Aggregate

  1. I think the C# implementation here is by far suboptimal. Making a list and populating it results in allocating memory and making the code more complex, when you can just use SelectMany and stream everything as IEnumerable.
    The performance should also be better as values read can be written right away, resulting in better memory/cache locality, and simultaneous read/write from the File.Read/Write framework caching/buffering the IO.
    Using a format string also simplifies the code and makes it nicer.
    I prefer the F# [1..30] notation, and wish C# would get that.

    var templ = @”C:\temp\my_file_{0}.txt”;

    var content = Enumerable.Range(1, 30).SelectMany(i => File.ReadAllLines(string.Format(templ, i)), (i, s) => s);

    File.WriteAllLines(string.Format(templ, “”), content);

  2. Better yet, use the newer IEnumerable File.ReadLines (note the missing “All”) intead of ReadAllLines, the former does not read the whole file into one big array at once, but rather “on the fly” as you iterate.
    I.e. with that, and making sure you pass an IEnumerable to WriteAllLines as well, choosing the overlead which does take an IEnumerable, you should be able to do this with large files without occupying all of your RAM.

    A bit unfortunate that the naming is inconsistent. Since you can’t have overloads based on return types, it had to be an extra function for Read, and for symmetry, they could have made a WriteLines( IEnumerable … ) as well, instead of making another overload of WriteAllLines, but oh well…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.