Meta functions and parallel programming: map, filter, and reduce

(This is a continuation of my previous post, Why is parallel programming difficult?.)

When programming, we’re always dealing with collections — objects that hold other objects. You cannot go far without them — they’re as fundamental as bools and forks (if statements).

In Erlang and Lisp we often use lists as the default generic collection. Lists are elegant structures, defined recursively from pairs. They are easy to build and take apart.

But lists impose an ordering on their contents even when what you really want an unordered collection. And since you’ve got a list it’s easy to start using it in an ordered way, and write code that uses that order. Even when it’s not necessary.

When you do that you add an unnecessary constraint to you code, and you can inadvertently lose parallelism. Ordering doesn’t have to break parallelism, but it often does.

And of course it doesn’t matter much since we mostly write linear programs. We don’t program parallel. But we might start getting parallel in the near future, so it doesn’t hurt to raise your awareness now.

One way to stay order-free is to use the meta-functions map, filter, and reduce when working with lists (or any other collection). These meta-functions are order-free if you use them like you’re supposed to.

map Apply a function to each element in a collection.
Collect the return values into a collection the same type as the source.
filter Apply a predicate function to each element in a collection.
Return a subset of the original collection containing only the elements where the predicate returned true.
reduce Apply a function to a small group elements from the collection. Keep doing this to the original elements and to the return values until there is only one value left. Return that reduced value.

Binary reduction applies an associative function to pairs in the collection, and then applies the function to the results of the pairs, etc, until it finally boils down to a single value. So if a collection has 11 values, the first reduction invokes the reduction function 5 times (in parallel), which reduces the collection to 6 values. The next reduction leaves 3 values, then 2, and finally 1 which is the return value.

When there is only one element in the original collection, reduce usually returns that value without applying the function. Or it could apply the binary function to the one element and a default value. Reduction for odd-lot remainders with higher-arity functions also has to be specified.

When the original collection has no elements reduce might return a special (default) value. Or it might throw an error.

If the source collection is unordered, the reducing function should be commutative as well as associative.

These are my personal definitions for these functions, but I’ve seen some variation. Reduce in particular seems to have different meanings in different situations.

For me, reduce is not the same as fold. Fold (or accumulate) sweeps through a collection, repeatedly applying a function to the next element and an accumulated value. If the collection is ordered, a left-fold sweeps through the elements from first to last, and a right-fold sweeps from last to first. Fold keeps the accumulating value separate from the element values, while reduce doesn’t have an accumulating value (although it may have a default value).

Distinguishing between reduce and fold is not universal. For example, the reduce functions in Python, Common Lisp, and JavaScript are really fold functions. At least to me they are.

In my definition, map and filter should be able to apply their functions in parallel or in any order without changing the outcome. So you should not use map to print out the items of a list in order, for example.

Because the predicate functions of filter are usually small and fast, it is rare to see a multi-threaded stand-alone filter. But it is sometimes combined with map and/or reduce to something like filter-map. For example, Qt provides a QtConcurrent::filteredReduced function.

If you map or filter a list, the returned list will reflect the order in the original list. This is how it should be — the result collection should keep the structure of the original collection as much as possible. But if you’re using a list to represent an unordered collection, or if you’re going to reduce mapped results in an order-independent way, then you can relax this constraint.

Maybe I’ll talk about that next time.


6 Responses to “Meta functions and parallel programming: map, filter, and reduce”

  1. geeks on August 17th, 2009 4:38 am

    very good and well writen article, thanks for mentioning that, I really learned today,
    thanks for posting it

  2. Neal on August 17th, 2009 9:50 pm

    Your welcome. I try to stay informal in my writing ’cause I think it’s easier to read. It wouldn’t work in a journal, but it’s right for a blog.

  3. on June 6th, 2012 5:54 pm

    This is such a great resource that you’re offering and you offer out at no cost.

    I appreciate seeing blogs that realize the worth of offering a perfect useful resource totally free.

    I genuinely loved reading your submit.

  4. cell phone lookup on July 1st, 2012 4:20 am

    Hi there, I just found your site and wanted to say that Ive
    truly enjoyed browsing your blog posts. After all Ill be subscribing to your feed and I hope you write again very soon!

  5. on August 23rd, 2012 8:26 am

    window treatment materials in this day and age have
    increased in price, i wish that they have a price drop next year

  6. reverse phone lookup cell on August 24th, 2012 4:55 am

    pleasant post, keep up with this interesting work. It really is first-rate to recognize that this topic is being covered also on this web site so
    thank for taking time to discuss this!