\chapter{MPI Terms and Conventions} \label{chap:terms} \label{sec:terms} %Version as of 4/27/95 This chapter explains notational terms and conventions used throughout the \MPI/ document, some of the choices that have been made, and the rationale behind those choices. \section{Document Notation} \begin{rationale} Throughout this document, the rationale for the design choices made in the interface specification is set off in this format. Some readers may wish to skip these sections, while readers interested in interface design may want to read them carefully. \end{rationale} \begin{users} Throughout this document, material that speaks to users and illustrates usage is set off in this format. Some readers may wish to skip these sections, while readers interested in programming in \MPI/ may want to read them carefully. \end{users} \begin{implementors} Throughout this document, material that is primarily commentary to implementors is set off in this format. Some readers may wish to skip these sections, while readers interested in \MPI/ implementations may want to read them carefully. \end{implementors} \section{Procedure Specification} \MPI/ procedures are specified using a language independent notation. The arguments of procedure calls are marked as \type{\IN}, \type{\OUT} or \type{\INOUT}. The meanings of these are: \begin{itemize} \item the call uses but does not update an argument marked \type{\IN}, \item the call may update an argument marked \type{\OUT}, \item the call both uses and updates an argument marked \type{\INOUT}. \end{itemize} There is one special case --- if an argument is a handle to an opaque object (these terms are defined in Section \ref{terms:opaque-objects}), and the object is updated by the procedure call, then the argument is marked \type{\OUT}. It is marked this way even though the handle itself is not modified --- we use the \type{\OUT} attribute to denote that what the handle {\em references} is updated. The definition of \MPI/ tries to avoid, to the largest possible extent, the use of \type{\INOUT} arguments, because such use is error-prone, especially for scalar arguments. A common occurrence for \MPI/ functions is an argument that is used as \type{\IN} by some processes and \type{\OUT} by other processes. Such argument is, syntactically, an \type{\INOUT} argument and is marked as such, although, semantically, it is not used in one call both for input and for output. Another frequent situation arises when an argument value is needed only by a subset of the processes. When an argument is not significant at a process then an arbitrary value can be passed as argument. Unless specified otherwise, an argument of type \type{\OUT} or type \type{\INOUT} cannot be aliased with any other argument passed to an \MPI/ procedure. An example of argument aliasing in C appears below. If we define a C procedure like this, \begin{verbatim} void copyIntBuffer( int *pin, int *pout, int len ) { int i; for (i=0; i$} to represent a choice variable, for C, we use {\sf (void *)}. \discuss{ The Fortran 77 standard specifies that the type of actual arguments need to agree with the type of dummy arguments; no construct equivalent to C void pointers is available. Thus, it would seem that there is no standard conforming mechanism to support choice arguments. However, most Fortran compilers either don't check type consistency of calls to external routines, or support a special mechanism to link foreign (e.g., C) routines. We accept this non-conformity with the Fortran 77 standard. I.e., we accept that the same routine may be passed an actual argument of a different type at distinct calls. Generic routines can be used in Fortran 90 to provide a standard conforming solution. This solution will be consistent with our nonstandard conforming Fortran 77 solution. } \subsection{Addresses} Some \MPI/ procedures use {\em address} arguments that represent an absolute address in the calling program. The datatype of such an argument is an integer of the size needed to hold any valid address in the execution environment. \section{Language Binding} \label{subsec:lang} \label{subsec:binding} This section defines the rules for \MPI/ language binding in general and for Fortran 77 and ANSI C in particular. Defined here are various object representations, as well as the naming conventions used for expressing this standard. The actual calling sequences are defined elsewhere. It is expected that any Fortran 90 and C++ implementations use the Fortran 77 and ANSI C bindings, respectively. Although we consider it premature to define other bindings to Fortran 90 and C++, the current bindings are designed to encourage, rather than discourage, experimentation with better bindings that might be adopted later. Since the word PARAMETER is a keyword in the Fortran language, we use the word ``argument'' to denote the arguments to a subroutine. These are normally referred to as parameters in C, however, we expect that C programmers will understand the word ``argument'' (which has no specific meaning in C), thus allowing us to avoid unnecessary confusion for Fortran programmers. There are several important language binding issues not addressed by this standard. This standard does not discuss the interoperability of message passing between languages. It is fully expected that many implementations will have such features, and that such features are a sign of the quality of the implementation. \subsection{Fortran 77 Binding Issues} All \MPI/ names have an {\tt MPI\_} prefix, and all characters are capitals. Programs must not declare variables or functions with names beginning with the prefix, {\tt MPI\_}. This is mandated to avoid possible name collisions. All \MPI/ Fortran subroutines have a return code in the last argument. A few \MPI/ operations are functions, which do not have the return code argument. The return code value for successful completion is \const{MPI\_SUCCESS}. Other error codes are implementation dependent; see Chapter \ref{chap:environment}. Handles are represented in Fortran as \ftype{INTEGER}s. Binary-valued variables are of type \ftype{LOGICAL}. Array arguments are indexed from one. Unless explicitly stated, the \MPI/ F77 binding is consistent with ANSI standard Fortran 77. There are several points where this standard diverges from the ANSI Fortran 77 standard. These exceptions are consistent with common practice in the Fortran community. In particular: \begin{itemize} \item \MPI/ identifiers are limited to thirty, not six, significant characters. \item \MPI/ identifiers may contain underscores after the first character. \item An \MPI/ subroutine with a choice argument may be called with different argument types. An example is shown in Figure \ref{fig:nomatch}. This violates the letter of the Fortran standard, but such a violation is common practice. An alternative would be to have a separate version of \func{MPI\_SEND} for each data type. \item Although not required, it is strongly suggested that named \MPI/ constants (\ftype{PARAMETER}s) be provided in an include file, called {\tt mpif.h}. On systems that do not support include files, the implementation should specify the values of named constants. \item Vendors are encouraged to provide type declarations in the {\tt mpif.h} file on Fortran systems that support user-defined types. One should define, if possible, the type \type{MPI\_ADDRESS}, which is an \type{INTEGER} of the size needed to hold an address in the execution environment. On systems where type definition is not supported, it is up to the user to use an \type{INTEGER} of the right kind to represent addresses (i.e., {\tt INTEGER*4} on a 32 bit machine, {\tt INTEGER*8} on a 64 bit machine, etc.). \end{itemize} \begin{figure} \begin{verbatim} double precision a integer b ... call MPI_send(a,...) call MPI_send(b,...) \end{verbatim} \caption{An example of calling a routine with mismatched formal and actual arguments.} \label{fig:nomatch} \end{figure} \snir All \MPI/ named constants can be used wherever an entity declared with the {\tt PARAMETER} attribute can be used in Fortran. There is one exception to this rule: the \MPI/ constant \const{MPI\_BOTTOM} (section \ref{subsec:pt2pt-addfunc}) can only be used as a buffer argument. \rins \subsection{C Binding Issues} We use the ANSI C declaration format. All \MPI/ names have an {\tt MPI\_} prefix, defined constants are in all capital letters, and defined types and functions have one capital letter after the prefix. Programs must not declare variables or functions with names beginning with the prefix, {\tt MPI\_}. This is mandated to avoid possible name collisions. The definition of named constants, function prototypes, and type definitions must be supplied in an include file {\sf mpi.h}. Almost all C functions return an error code. The successful return code will be {\tt MPI\_SUCCESS}, but failure return codes are implementation dependent. A few C functions do not return values, so that they can be implemented as macros. Type declarations are provided for handles to each category of opaque objects. Either a pointer or an integer type is used. Array arguments are indexed from zero. Logical flags are integers with value 0 meaning ``false'' and a non-zero value meaning ``true.'' Choice arguments are pointers of type {\tt void*}. Address arguments are of \MPI/ defined type \const{MPI\_Aint}. This is defined to be an \const{int} of the size needed to hold any valid address on the target architecture. \snir All named \MPI/ constants can be used in initialization expressions or assignments like C constants. \rins \section{Processes} An \MPI/ program consists of autonomous processes, executing their own code, in an MIMD style. The codes executed by each process need not be identical. The processes communicate via calls to \MPI/ communication primitives. Typically, each process executes in its own address space, although shared-memory implementations of \MPI/ are possible. This document specifies the behavior of a parallel program assuming that only \MPI/ calls are used for communication. The interaction of an \MPI/ program with other possible means of communication (e.g., shared memory) is not specified. \MPI/ does not specify the execution model for each process. A process can be sequential, or can be multi-threaded, with threads possibly executing concurrently. Care has been taken to make \MPI/ ``thread-safe,'' by avoiding the use of implicit state. The desired interaction of \MPI/ with threads is that concurrent threads be all allowed to execute \MPI/ calls, and calls be reentrant; a blocking \MPI/ call blocks only the invoking thread, allowing the scheduling of another thread. \MPI/ does not provide mechanisms to specify the initial allocation of processes to an \MPI/ computation and their binding to physical processors. It is expected that vendors will provide mechanisms to do so either at load time or at run time. Such mechanisms will allow the specification of the initial number of required processes, the code to be executed by each initial process, and the allocation of processes to processors. Also, the current proposal does not provide for dynamic creation or deletion of processes during program execution (the total number of processes is fixed), although it is intended to be consistent with such extensions. Finally, we always identify processes according to their relative rank in a group, that is, consecutive integers in the range {\tt 0..groupsize-1}. \section{Error Handling} \MPI/ provides the user with reliable message transmission. A message sent is always received correctly, and the user does not need to check for transmission errors, time-outs, or other error conditions. In other words, \MPI/ does not provide mechanisms for dealing with failures in the communication system. If the \MPI/ implementation is built on an unreliable underlying mechanism, then it is the job of the implementor of the \MPI/ subsystem to insulate the user from this unreliability, or to reflect unrecoverable errors as failures. Whenever possible, such failures will be reflected as errors in the relevant communication call. Similarly, \MPI/ itself provides no mechanisms for handling processor failures. The error handling facilities described in section~\ref{sec:inquiry-error} can be used to restrict the scope of an unrecoverable error, or design error recovery at the application level. Of course, \MPI/ programs may still be erroneous. A {\bf program error} can occur when an \MPI/ call is called with an incorrect argument (non-existing destination in a send operation, buffer too small in a receive operation, etc.) This type of error would occur in any implementation. In addition, a {\bf resource error} may occur when a program exceeds the amount of available system resources (number of pending messages, system buffers, etc.). The occurrence of this type of error depends on the amount of available resources in the system and the resource allocation mechanism used; this may differ from system to system. A high-quality implementation will provide generous limits on the important resources so as to alleviate the portability problem this represents. Almost all \MPI/ calls return a code that indicates successful completion of the operation. Whenever possible, \MPI/ calls return an error code if an error occurred during the call. \snir In certain circumstances, when the \MPI/ function may complete several distinct operations, and therefore may generate several independent errors, the \MPI/ function may return multiple error codes. \rins By default, an error detected during the execution of the \MPI/ library causes the parallel computation to abort. However, \MPI/ provides mechanisms for users to change this default and to handle recoverable errors. The user may specify that no error is fatal, and handle error codes returned by \MPI/ calls by himself or herself. Also, the user may provide his or her own error-handling routines, which will be invoked whenever an \MPI/ call returns abnormally. The \MPI/ error handling facilities are described in section~\ref{sec:inquiry-error}. Several factors limit the ability of \MPI/ calls to return with meaningful error codes when an error occurs. \MPI/ may not be able to detect some errors; other errors may be too expensive to detect in normal execution mode; finally some errors may be ``catastrophic'' and may prevent \MPI/ from returning control to the caller in a consistent state. Another subtle issue arises because of the nature of asynchronous communications: \MPI/ calls may initiate operations that continue asynchronously after the call returned. Thus, the operation may return with a code indicating successful completion, yet later cause an error exception to be raised. If there is a subsequent call that relates to the same operation (e.g., a call that verifies that an asynchronous operation has completed) then the error argument associated with this call will be used to indicate the nature of the error. In a few cases, the error may occur after all calls that relate to the operation have completed, so that no error value can be used to indicate the nature of the error (e.g., an error in a send with the ready mode). Such an error must be treated as fatal, since information cannot be returned for the user to recover from it. This document does not specify the state of a computation after an erroneous \MPI/ call has occurred. The desired behavior is that a relevant error code be returned, and the effect of the error be localized to the greatest possible extent. E.g., it is highly desireable that an erroneous receive call will not cause any part of the receiver's memory to be overwritten, beyond the area specified for receiving the message. Implementations may go beyond this document in supporting in a meaningful manner \MPI/ calls that are defined here to be erroneous. For example, \MPI/ specifies strict type matching rules between matching send and receive operations: it is erroneous to send a floating point variable and receive an integer. Implementations may go beyond these type matching rules, and provide automatic type conversion in such situations. It will be helpful to generate warnings for such nonconforming behavior. \section{Implementation issues} There are a number of areas where an \MPI/ implementation may interact with the operating environment and system. While \MPI/ does not mandate that any services (such as I/O or signal handling) be provided, it does strongly suggest the behavior to be provided if those services are available. This is an important point in achieving portability across platforms that provide the same set of services. \subsection{Independence of Basic Runtime Routines} \MPI/ programs require that library routines that are part of the basic language environment (such as \code{date} and \code{write} in Fortran and \code{printf} and \code{malloc} in ANSI C) and are executed after \code{MPI\_INIT} and before \code{MPI\_FINALIZE} operate independently and that their {\em completion\/} is independent of the action of other processes in an \MPI/ program. Note that this in no way prevents the creation of library routines that provide parallel services whose operation is collective. However, the following program is expected to complete in an ANSI C environment regardless of the size of \code{MPI\_COMM\_WORLD} (assuming that I/O is available at the executing nodes). \begin{verbatim} int rank; MPI_Init( argc, argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); if (rank == 0) printf( "Starting program\n" ); MPI_Finalize(); \end{verbatim} The corresponding Fortran 77 program is also expected to complete. An example of what is {\em not\/} required is any particular ordering of the action of these routines when called by several tasks. For example, \MPI/ makes neither requirements nor recommendations for the output from the following program (again assuming that I/O is available at the executing nodes). \begin{verbatim} MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Output from task rank %d\n", rank ); \end{verbatim} In addition, calls that fail because of resource exhaustion or other error are not considered a violation of the requirements here (however, they are required to complete, just not to complete successfully). \subsection{Interaction with signals in POSIX} \MPI/ does not specify either the interaction of processes with signals, in a UNIX environment, or with other events that do not relate to \MPI/ communication. That is, signals are not significant from the view point of \MPI/, and implementors should attempt to implement \MPI/ so that signals are transparent: an \MPI/ call suspended by a signal should resume and complete after the signal is handled. Generally, the state of a computation that is visible or significant from the view-point of \MPI/ should only be affected by \MPI/ calls. The intent of \MPI/ to be thread and signal safe has a number of subtle effects. For example, on Unix systems, a catchable signal such as SIGALRM (an alarm signal) must not cause an \MPI/ routine to behave differently than it would have in the absence of the signal. Of course, if the signal handler issues \MPI/ calls or changes the environment in which the \MPI/ routine is operating (for example, consuming all available memory space), the \MPI/ routine should behave as appropriate for that situation (in particular, in this case, the behavior should be the same as for a multithreaded \MPI/ implementation). A second effect is that a signal handler that performs \MPI/ calls must not interfere with the operation of \MPI/. For example, an \MPI/ receive of any type that occurs within a signal handler must not cause erroneous behavior by the \MPI/ implementation. Note that an implementation is permitted to prohibit the use of \MPI/ calls from within a signal handler, and is not required to detect such use. It is highly desirable that \MPI/ not use \code{SIGALRM}, \code{SIGFPE}, or \code{SIGIO}. An implementation is {\em required\/} to clearly document all of the signals that the \MPI/ implementation uses; a good place for this information is a Unix `\code{man}' page on \code{MPI}. \snir \section{Examples} The examples in this document are for illustration purposes only. They are not intended to specify the standard. Furthermore, the examples have not been carefully checked or verified. \rins .