Crossing the street — what are the odds?
I needed to go to the store last weekend, and I thought I’d walk. It isn’t far and walking is good for you. I did it all the time when I lived in a city.
But to get to the store I have to cross one big street. There’s a light and a crosswalk, but it’s one of those streets where the drivers don’t expect pedestrians. And there’s always cars turning left and right even when there’s a cross signal. It’s a complicated wide busy intersection.
I’ve crossed this street before however, and I’ve never had a close call. There’s a lot going on, but I try to be super careful. I feel the risk. I wouldn’t want my kid to have to cross this street.
So I’m thinking, what if I had to cross this street every day for, say 10 years. I’d be crossing thousands of times. What are the odds that I’d be hit by a car, have really bad luck just once in all those chances?
Let’s say the odds of being hit are one in 100 million. If I cross 5 thousand times, my odds of being hit are about one in 20 thousand. I’m a little uncomfortable with that, but it feels like an acceptable risk for something spread out over 10 years.
But I think the odds are probably somewhat worse than that, maybe even as high as one in 10,000. Which would mean I have a pretty good chance of being hit over 10 years. Which negates all the health benefits of walking.
And if I shouldn’t be crossing that street every day for 10 years, I probably shouldn’t be crossing at all without a good reason.
So the problem is that you can’t rely on intuition to decide whether something like this is safe. Intuition isn’t very good at distinguishing between 1 in 10,000 and 1 in 10,000,000. Intuition isn’t very good at summing odds over 10 years. You have to think it out.
Now all I have to do is find out what the odds really are.
We’re not used to thinking parallel
(This is a continuation of my previous post, Why is parallel programming difficult?.)
Because we’re not looking for it, we often don’t notice the potential parallelisms in our programs. I’m not talking about obvious parallel problems like matrix multiplication and graphics rendering pipelines. I mean everyday problems. For example, we’re always weaving independent patterns together to construct our procedures, like this.
pattern_a( A, B ) C1 <- calc_1( A) C2 <- calc_2( A, B, C1) return calc_3( C1, C2) pattern_b( X, Y) Z <- calc_4( X) return calc_5( X, X, Z, Y) procedure_that_uses_both_patterns( N, M ) C1 <- calc_1( N) Z <- calc_4( M) C2 <- calc_2( N, Z, C1) return calc_5( M, M, Z, calc_3( C1, C2))
We usually don’t think much about the fact that the first two lines C1<-calc_1(N) and Z<-calc_4(M) can probably be switched around or performed in parallel.
And why should we? There’s no annotation in most programming languages to capture the parallelism. The compiler doesn’t want to know.
And what about common algorithms like quicksort. It’s usually taught and implemented as linear, but it’s really a bifurcating parallel algorithm. The merge-sort algorithm involves a massive parallel split at the beginning followed by a hierarchical parallel reduction for the merge. Bubble-sort is a great parallel algorithm. It’s a linear-time sort, provided you have a very long chain of processors, each of which can communicate with its two neighbors.
Linearism is often not inherent. There are parallelisms in many common algorithms. But we don’t notice them much because our programming languages encourage linear thought, and don’t give us a way to capture or annotate parallelisms.
Why is parallel programming difficult?
Why is parallel programming so hard? At least it’s supposed to be hard — that’s the myth. Of course there’s also a myth (among non-programmers) that all programming is hard, for everyone except 12 year-olds in movies.
One reason parallel programming is hard is we don’t get any practice. Programmers aren’t used to thinking parallel. Introductory programming is linear programming, and most real-world bread-and-butter programming tasks are don’t require a parallel approach. And frankly, it’s easier to understand and express a linear program, and it’s certainly easier to debug.
And besides, parallel hardware has always been exotic, at least up until now. So there hasn’t been much demand for parallelism, and the tools reflect that. Programming languages are (mostly) linear, described as linear text that becomes a linear token stream. Years of parser and compiler research has focused on grammars to translate that linear token stream into linear machine code.
Programs were linear even before text files. Fortran programs used to be stored on stacks of punch cards.
So our tools and habits encourage linear thinking. We introduce unnecessary constraints and dependencies that break parallelism without even thinking about it.
And that’s one reason parallel programming is hard. A big reason.