Meta functions and parallel programming: map, filter, and reduce

(This is a continuation of my previous post, Why is parallel programming difficult?.)

When programming, we’re always dealing with collections — objects that hold other objects. You cannot go far without them — they’re as fundamental as bools and forks (if statements).

In Erlang and Lisp we often use lists as the default generic collection. Lists are elegant structures, defined recursively from pairs. They are easy to build and take apart.

But lists impose an ordering on their contents even when what you really want an unordered collection. And since you’ve got a list it’s easy to start using it in an ordered way, and write code that uses that order. Even when it’s not necessary.

When you do that you add an unnecessary constraint to you code, and you can inadvertently lose parallelism. Ordering doesn’t have to break parallelism, but it often does.

And of course it doesn’t matter much since we mostly write linear programs. We don’t program parallel. But we might start getting parallel in the near future, so it doesn’t hurt to raise your awareness now.

One way to stay order-free is to use the meta-functions map, filter, and reduce when working with lists (or any other collection). These meta-functions are order-free if you use them like you’re supposed to.

map Apply a function to each element in a collection.
Collect the return values into a collection the same type as the source.
filter Apply a predicate function to each element in a collection.
Return a subset of the original collection containing only the elements where the predicate returned true.
reduce Apply a function to a small group elements from the collection. Keep doing this to the original elements and to the return values until there is only one value left. Return that reduced value.

Binary reduction applies an associative function to pairs in the collection, and then applies the function to the results of the pairs, etc, until it finally boils down to a single value. So if a collection has 11 values, the first reduction invokes the reduction function 5 times (in parallel), which reduces the collection to 6 values. The next reduction leaves 3 values, then 2, and finally 1 which is the return value.

When there is only one element in the original collection, reduce usually returns that value without applying the function. Or it could apply the binary function to the one element and a default value. Reduction for odd-lot remainders with higher-arity functions also has to be specified.

When the original collection has no elements reduce might return a special (default) value. Or it might throw an error.

If the source collection is unordered, the reducing function should be commutative as well as associative.

These are my personal definitions for these functions, but I’ve seen some variation. Reduce in particular seems to have different meanings in different situations.

For me, reduce is not the same as fold. Fold (or accumulate) sweeps through a collection, repeatedly applying a function to the next element and an accumulated value. If the collection is ordered, a left-fold sweeps through the elements from first to last, and a right-fold sweeps from last to first. Fold keeps the accumulating value separate from the element values, while reduce doesn’t have an accumulating value (although it may have a default value).

Distinguishing between reduce and fold is not universal. For example, the reduce functions in Python, Common Lisp, and JavaScript are really fold functions. At least to me they are.

In my definition, map and filter should be able to apply their functions in parallel or in any order without changing the outcome. So you should not use map to print out the items of a list in order, for example.

Because the predicate functions of filter are usually small and fast, it is rare to see a multi-threaded stand-alone filter. But it is sometimes combined with map and/or reduce to something like filter-map. For example, Qt provides a QtConcurrent::filteredReduced function.

If you map or filter a list, the returned list will reflect the order in the original list. This is how it should be — the result collection should keep the structure of the original collection as much as possible. But if you’re using a list to represent an unordered collection, or if you’re going to reduce mapped results in an order-independent way, then you can relax this constraint.

Maybe I’ll talk about that next time.

Crossing the street — what are the odds?

I needed to go to the store last weekend, and I thought I’d walk. It isn’t far and walking is good for you. I did it all the time when I lived in a city.

But to get to the store I have to cross one big street. There’s a light and a crosswalk, but it’s one of those streets where the drivers don’t expect pedestrians. And there’s always cars turning left and right even when there’s a cross signal. It’s a complicated wide busy intersection.

I’ve crossed this street before however, and I’ve never had a close call. There’s a lot going on, but I try to be super careful. I feel the risk. I wouldn’t want my kid to have to cross this street.

So I’m thinking, what if I had to cross this street every day for, say 10 years. I’d be crossing thousands of times. What are the odds that I’d be hit by a car, have really bad luck just once in all those chances?

Let’s say the odds of being hit are one in 100 million. If I cross 5 thousand times, my odds of being hit are about one in 20 thousand. I’m a little uncomfortable with that, but it feels like an acceptable risk for something spread out over 10 years.

But I think the odds are probably somewhat worse than that, maybe even as high as one in 10,000. Which would mean I have a pretty good chance of being hit over 10 years. Which negates all the health benefits of walking.

And if I shouldn’t be crossing that street every day for 10 years, I probably shouldn’t be crossing at all without a good reason.

So the problem is that you can’t rely on intuition to decide whether something like this is safe. Intuition isn’t very good at distinguishing between 1 in 10,000 and 1 in 10,000,000. Intuition isn’t very good at summing odds over 10 years. You have to think it out.

Now all I have to do is find out what the odds really are.

We’re not used to thinking parallel

(This is a continuation of my previous post, Why is parallel programming difficult?.)

Because we’re not looking for it, we often don’t notice the potential parallelisms in our programs. I’m not talking about obvious parallel problems like matrix multiplication and graphics rendering pipelines. I mean everyday problems. For example, we’re always weaving independent patterns together to construct our procedures, like this.

pattern_a( A, B )
  C1 <- calc_1( A)
  C2 <- calc_2( A, B, C1)
  return calc_3( C1, C2)

pattern_b( X, Y)
  Z <- calc_4( X)
  return calc_5( X, X, Z, Y)

procedure_that_uses_both_patterns( N, M )
  C1 <- calc_1( N)
  Z  <- calc_4( M)
  C2 <- calc_2( N, Z, C1)
  return calc_5( M, M, Z, calc_3( C1, C2))

We usually don’t think much about the fact that the first two lines C1<-calc_1(N) and Z<-calc_4(M) can probably be switched around or performed in parallel.

And why should we? There’s no annotation in most programming languages to capture the parallelism. The compiler doesn’t want to know.

And what about common algorithms like quicksort. It’s usually taught and implemented as linear, but it’s really a bifurcating parallel algorithm. The merge-sort algorithm involves a massive parallel split at the beginning followed by a hierarchical parallel reduction for the merge. Bubble-sort is a great parallel algorithm. It’s a linear-time sort, provided you have a very long chain of processors, each of which can communicate with its two neighbors.

Linearism is often not inherent. There are parallelisms in many common algorithms. But we don’t notice them much because our programming languages encourage linear thought, and don’t give us a way to capture or annotate parallelisms.

Next Page →