MPI Communicator Objects
(This is not finished. The classes and methods have to be defined and examples have to be given. Details about what the attributes are for and how to use the topology should be provided. Suggestions are welcome.)
MPI provides an MPI_Comm handle to work with communicators. You use communicators to send messages back and forth between processes.
Communicator types
There are two kinds of communicators, “intra” and “inter”. These names are easy to confuse so I prefer to call them “cloud” and “split” communicators. Communicator types can be organized into a pure base class and two subclasses:
- communicator — pure base class
- cloud or intracommunicator — subtype of communicator
- split or intercommunicator — subtype of communicator
A cloud (intra-communicator) has a single group of processes, and it always includes the home process (otherwise it would not be visible to the home process). The home process can send messages to any other process in the cloud.
A split (inter-communicator) consists of two groups of processes: local and remote. The local and remote groups are disjoint, so any process in the local group cannot be in the remote group and vice versa. The home process is always a member of the local group. The remote group must contain at least one process, and can contain many. The home process can send messages to any process in the remote group, and any process in the remote group can send messages back. But the home process cannot use the split to send messages to other processes in the local group.
Processes cannot be added or removed from a communicator, nor can the topology be changed, once the communicator is defined.
What communicators do
Abstractly you can think of all communicators as consisting of a local and remote group. In a cloud the local and remote groups are always exactly the same (same processes in the same order). In a split the two groups do not overlap. There is no way to build a communicator where the local and remote groups overlap but are not identical.
Communicators are shared objects. Whenever a process creates a communicator, all the other processes in the communicator also have to create that same communicator with identical process groups. This ensures that all processes in a communicator know about the communicator. When creating a split communicator, all the processes in the remote group create an identical communicator to the processes in the local group, except the local and remote groups are switched.
Special communicators — world and parent
The MPI process starts with the world communicator (MPI_COMM_WORLD), which is a cloud (intra) communicator. All the processes started at the same time as the home process will share identical worlds.
The MPI process may also start with a parent communicator (MPI_Comm_get_parent). This is a split communicator, where the local group is identical to the processes in the world communicator (?). The remote group contains all the parents (there can be one or many) that spawned the local group of processes. There is no parent communicator for processes that were not spawned.
The MPI process also starts out with a cloud communicator containing only itself (MPI_COMM_SELF). The home process is always rank zero in the self-cloud.
Topologies
You can create a communicator with one of two kinds of topology: an orthogonal grid, or a graph. Supercomputers often connect processors together in a grid topology, and the MPI orthogonal grid can have as many dimension as needed. A graph is more general than a grid and can represent any static topology.
Topologies let a process easily communicate with its nearest neighbors. Only cloud communicators (not splits) can have topologies (? is this true ?).
Attributes
Communicators have attributes, which are key/value pairs. The keys are integer and are assigned by MPI. You have to create an attribute to get a key.
Buffers
The user can attach a buffer to a communicator. The user has to be aware of buffers when using MPI.
Notes, thoughts, etc
I hesitate to rename intracommunicator to “cloud” and intercommunicator to “split”, but I feel the original words are too long and easily confused. Split is also the name of an MPI function (MPI_Comm_split), but I don’t think this is confusing, especially since MPI_Comm_split(..) creates split communicators.
Class names we could use instead of cloud: intracom, unified, uni-group, local, pond, near, nearcomm.
Class names we could use instead of split: intercom, jetstream, lightning_bolt (extends the cloud metaphor, but sounds more like a message than a message-pathway), disjoint, local_remote, duo-group, bi-group, here-there, barbell, pathway, telephone, arrow, bridge, far, farcomm.
This needs examples of how key/value attributes are used.