(This is a continuation of my previous post, Why is parallel programming difficult?.)
When programming, we’re always dealing with collections — objects that hold other objects. You cannot go far without them — they’re as fundamental as bools and forks (if statements).
In Erlang and Lisp we often use lists as the default generic collection. Lists are elegant structures, defined recursively from pairs. They are easy to build and take apart.
But lists impose an ordering on their contents even when what you really want an unordered collection. And since you’ve got a list it’s easy to start using it in an ordered way, and write code that uses that order. Even when it’s not necessary.
When you do that you add an unnecessary constraint to you code, and you can inadvertently lose parallelism. Ordering doesn’t have to break parallelism, but it often does.
And of course it doesn’t matter much since we mostly write linear programs. We don’t program parallel. But we might start getting parallel in the near future, so it doesn’t hurt to raise your awareness now.
One way to stay order-free is to use the meta-functions map, filter, and reduce when working with lists (or any other collection). These meta-functions are order-free if you use them like you’re supposed to.
|map||Apply a function to each element in a collection.
Collect the return values into a collection the same type as the source.
|filter||Apply a predicate function to each element in a collection.
Return a subset of the original collection containing only the elements where the predicate returned true.
|reduce||Apply a function to a small group elements from the collection. Keep doing this to the original elements and to the return values until there is only one value left. Return that reduced value.
Binary reduction applies an associative function to pairs in the collection, and then applies the function to the results of the pairs, etc, until it finally boils down to a single value. So if a collection has 11 values, the first reduction invokes the reduction function 5 times (in parallel), which reduces the collection to 6 values. The next reduction leaves 3 values, then 2, and finally 1 which is the return value.
When there is only one element in the original collection, reduce usually returns that value without applying the function. Or it could apply the binary function to the one element and a default value. Reduction for odd-lot remainders with higher-arity functions also has to be specified.
When the original collection has no elements reduce might return a special (default) value. Or it might throw an error.
If the source collection is unordered, the reducing function should be commutative as well as associative.
These are my personal definitions for these functions, but I’ve seen some variation. Reduce in particular seems to have different meanings in different situations.
For me, reduce is not the same as fold. Fold (or accumulate) sweeps through a collection, repeatedly applying a function to the next element and an accumulated value. If the collection is ordered, a left-fold sweeps through the elements from first to last, and a right-fold sweeps from last to first. Fold keeps the accumulating value separate from the element values, while reduce doesn’t have an accumulating value (although it may have a default value).
In my definition, map and filter should be able to apply their functions in parallel or in any order without changing the outcome. So you should not use map to print out the items of a list in order, for example.
Because the predicate functions of filter are usually small and fast, it is rare to see a multi-threaded stand-alone filter. But it is sometimes combined with map and/or reduce to something like filter-map. For example, Qt provides a QtConcurrent::filteredReduced function.
If you map or filter a list, the returned list will reflect the order in the original list. This is how it should be — the result collection should keep the structure of the original collection as much as possible. But if you’re using a list to represent an unordered collection, or if you’re going to reduce mapped results in an order-independent way, then you can relax this constraint.
Maybe I’ll talk about that next time.