Meta functions and parallel programming: map, filter, and reduce

(This is a continuation of my previous post, Why is parallel programming difficult?.)

When programming, we’re always dealing with collections — objects that hold other objects. You cannot go far without them — they’re as fundamental as bools and forks (if statements).

In Erlang and Lisp we often use lists as the default generic collection. Lists are elegant structures, defined recursively from pairs. They are easy to build and take apart.

But lists impose an ordering on their contents even when what you really want an unordered collection. And since you’ve got a list it’s easy to start using it in an ordered way, and write code that uses that order. Even when it’s not necessary.

When you do that you add an unnecessary constraint to you code, and you can inadvertently lose parallelism. Ordering doesn’t have to break parallelism, but it often does.

And of course it doesn’t matter much since we mostly write linear programs. We don’t program parallel. But we might start getting parallel in the near future, so it doesn’t hurt to raise your awareness now.

One way to stay order-free is to use the meta-functions map, filter, and reduce when working with lists (or any other collection). These meta-functions are order-free if you use them like you’re supposed to.

map Apply a function to each element in a collection.
Collect the return values into a collection the same type as the source.
filter Apply a predicate function to each element in a collection.
Return a subset of the original collection containing only the elements where the predicate returned true.
reduce Apply a function to a small group elements from the collection. Keep doing this to the original elements and to the return values until there is only one value left. Return that reduced value.

Binary reduction applies an associative function to pairs in the collection, and then applies the function to the results of the pairs, etc, until it finally boils down to a single value. So if a collection has 11 values, the first reduction invokes the reduction function 5 times (in parallel), which reduces the collection to 6 values. The next reduction leaves 3 values, then 2, and finally 1 which is the return value.

When there is only one element in the original collection, reduce usually returns that value without applying the function. Or it could apply the binary function to the one element and a default value. Reduction for odd-lot remainders with higher-arity functions also has to be specified.

When the original collection has no elements reduce might return a special (default) value. Or it might throw an error.

If the source collection is unordered, the reducing function should be commutative as well as associative.

These are my personal definitions for these functions, but I’ve seen some variation. Reduce in particular seems to have different meanings in different situations.

For me, reduce is not the same as fold. Fold (or accumulate) sweeps through a collection, repeatedly applying a function to the next element and an accumulated value. If the collection is ordered, a left-fold sweeps through the elements from first to last, and a right-fold sweeps from last to first. Fold keeps the accumulating value separate from the element values, while reduce doesn’t have an accumulating value (although it may have a default value).

Distinguishing between reduce and fold is not universal. For example, the reduce functions in Python, Common Lisp, and JavaScript are really fold functions. At least to me they are.

In my definition, map and filter should be able to apply their functions in parallel or in any order without changing the outcome. So you should not use map to print out the items of a list in order, for example.

Because the predicate functions of filter are usually small and fast, it is rare to see a multi-threaded stand-alone filter. But it is sometimes combined with map and/or reduce to something like filter-map. For example, Qt provides a QtConcurrent::filteredReduced function.

If you map or filter a list, the returned list will reflect the order in the original list. This is how it should be — the result collection should keep the structure of the original collection as much as possible. But if you’re using a list to represent an unordered collection, or if you’re going to reduce mapped results in an order-independent way, then you can relax this constraint.

Maybe I’ll talk about that next time.

We’re not used to thinking parallel

(This is a continuation of my previous post, Why is parallel programming difficult?.)

Because we’re not looking for it, we often don’t notice the potential parallelisms in our programs. I’m not talking about obvious parallel problems like matrix multiplication and graphics rendering pipelines. I mean everyday problems. For example, we’re always weaving independent patterns together to construct our procedures, like this.

pattern_a( A, B )
  C1 <- calc_1( A)
  C2 <- calc_2( A, B, C1)
  return calc_3( C1, C2)

pattern_b( X, Y)
  Z <- calc_4( X)
  return calc_5( X, X, Z, Y)

procedure_that_uses_both_patterns( N, M )
  C1 <- calc_1( N)
  Z  <- calc_4( M)
  C2 <- calc_2( N, Z, C1)
  return calc_5( M, M, Z, calc_3( C1, C2))

We usually don’t think much about the fact that the first two lines C1<-calc_1(N) and Z<-calc_4(M) can probably be switched around or performed in parallel.

And why should we? There’s no annotation in most programming languages to capture the parallelism. The compiler doesn’t want to know.

And what about common algorithms like quicksort. It’s usually taught and implemented as linear, but it’s really a bifurcating parallel algorithm. The merge-sort algorithm involves a massive parallel split at the beginning followed by a hierarchical parallel reduction for the merge. Bubble-sort is a great parallel algorithm. It’s a linear-time sort, provided you have a very long chain of processors, each of which can communicate with its two neighbors.

Linearism is often not inherent. There are parallelisms in many common algorithms. But we don’t notice them much because our programming languages encourage linear thought, and don’t give us a way to capture or annotate parallelisms.

Why is parallel programming difficult?

Why is parallel programming so hard? At least it’s supposed to be hard — that’s the myth. Of course there’s also a myth (among non-programmers) that all programming is hard, for everyone except 12 year-olds in movies.

One reason parallel programming is hard is we don’t get any practice. Programmers aren’t used to thinking parallel. Introductory programming is linear programming, and most real-world bread-and-butter programming tasks are don’t require a parallel approach. And frankly, it’s easier to understand and express a linear program, and it’s certainly easier to debug.

And besides, parallel hardware has always been exotic, at least up until now. So there hasn’t been much demand for parallelism, and the tools reflect that. Programming languages are (mostly) linear, described as linear text that becomes a linear token stream. Years of parser and compiler research has focused on grammars to translate that linear token stream into linear machine code.

Programs were linear even before text files. Fortran programs used to be stored on stacks of punch cards.

So our tools and habits encourage linear thinking. We introduce unnecessary constraints and dependencies that break parallelism without even thinking about it.

And that’s one reason parallel programming is hard. A big reason.

Next Page →