GNU UPC Compiler for the SGI Origin 2000
01-Nov-2001 (Revision 1.5)
Gary Funck gary@intrepid.com
Under contract to Computer Sciences Corp.
as a subcontract to Silicon Graphics, Inc.
This report describes:
· the general approach used to implement UPC on the SGI Origin platform, running the Irix 6.5 (Unix based) operating system
· the overall operation and use of the UPC compiler
· the structure of the source files used to build the UPC compiler
· known bugs, testing results, and future enhancements
During the investigation phase of this project, several approaches were evaluated as candidates for an efficient and cost-effective method of implementing UPC on the SGI platform. As the implementation progressed, we found it necessary to revisit earlier design decisions, and based upon additional information, we modified elements of earlier designs. This document describes the UPC compiler and runtime as it is implemented (in version 1.7, dated Aug-9-2001), and supercedes earlier design documents.
The SGI UPC compiler is an early member of a family of GNU GCC-based UPC compilers. We recommend that you visit the GNU UPC site (http://www.gwu.edu/~upc/software/gnu-upc.html) hosted at George Washington University, if you are interested in following the GNU UPC status and development plans.
A UPC compilation/execution system is made up of the following components:
· A language translator, based upon the UPC 3.1.9 compiler (available at ftp://ftp.super.org/pub/UPC/current/). The UPC 3.1.9 compiler is based in turn upon the GCC compiler (version 2.7.1). The UPC 3.1.9 compiler has been upgraded to GCC version 2.95.2 by porting the UPC specific changes to the newer GCC compiler sources,
· A runtime library, which implements support routines called directly by code generated by the UPC compiler, and by calls from the user program to the support library routines described in the UPC specification. Although elements of the runtime library may operate on other Unix implementations, the runtime library as implemented is specific to the SGI Irix 6.5 operating system, and depends strongly upon SGI's multi-processing facilities.
The compiler conforms to the UPC Language Specifications (http://hpc.gwu.edu/~upc/doc/upc_specs.pdf) document, version 1.0, dated Feb-25-2001. This document describes the implementation and behavior of SGI UPC version 1.7.
The SGI UPC implementation uses an address re-mapping of a dedicated data segment to implement addressing and access to each UPC thread's local contribution to UPC shared data. In addition, a mapping to a global shared memory area is used to implement accesses to the data of UPC threads other than the currently executing thread.

Figure 1 Shared Data Memory Layout
All of the data for a shared object is allocated in a shared memory area (allocated out of the operating system's process swap space) accessible from all of the UPC application's threads. The global shared memory area is in turn partitioned into equal sized regions, such that the local contribution of each thread is aggregated into its own contiguous memory region. This method of shared data allocation and mapping is illustrated in Figure 1 (previous page).
The use of a global shared memory area, as a method for implementing inter-thread data access is well-suited to the SGI Origin 2000 architecture and offers the following benefits:
· Threads can quickly access shared data that has a logical association (affinity) with other threads.
· The SGI Origin 2000 hardware architecture ensures data consistency of shared data. This hardware support is several orders of magnitude faster than a software-implemented caching scheme.
· Operating system and linker support for separately linked sections can be effectively used to coalesce references to shared memory objects declared in the UPC application program. Thus, no special pre-linking (or collector) tool is needed.
The use of a global shared memory area has two potential disadvantages:
· Full operating system processes are used to implement UPC threads, Processes are required because private (local) static scope objects are replicated by the operating system without the need for additional compiler or runtime support. However, processes are generally not as efficient in their use of system resources as true OS-implemented threads.
· The design may present some limitations to scaling when the UPC application uses very large shared data objects. The limitation imposed by a 32-bit address space will restrict the amount of shared data to something less than 512 megabytes. However, the implementation can be extended to use 64-bit pointers, and this would practically eliminate any serious address space size limitation.
Although the UPC specification uses the term thread to designate the unit of parallel execution available to a UPC program, the term is potentially misleading, because the term thread is widely used in the realm of Unix operating systems to describe a particular method of implementing a multi-programmed, or parallel locus of control (and associated data) within a Unix process. In the Unix context, a thread is lighter weight than a process, because many of the resources owned by the process (which created the thread) are shared among all threads. In most Unix implementations of threads, the threads share a single copy of all variables declared with static scope. Using this conventional definition of thread, static scoped variables have certain semantics that are similar to UPC's shared objects. As Gerhard Harrop (CSC) pointed out in Thoughts on UPC (dated September 5, 2000) the use of conventional Unix threads to implement UPC's threads is attractive because the implementation of UPC shared objects is likely straightforward; however, the implementation of local static scoped variables and objects becomes problematic.
The SGI UPC compiler makes use of full-blown Unix processes to implement UPC threads, where one Unix process is created for each UPC thread. Unix processes are chosen to implement UPC threads because the operating system will directly provide the support necessary to ensure that local static scoped objects are replicated into each process (i.e., each UPC thread).
Shared data objects are implemented by first requiring that the UPC compiler locate each UPC thread's local contribution to UPC shared objects into a separate linkage section (named .shared). Linkage directives are generated by the compiler which ensure that the shared linkage section is linked into its own unique data segment, aligned on a page boundary.
The global shared memory area is mapped by all UPC threads, via a call to Irix's mmap(2) system call. The UPC runtime initialization procedure (running in a process that we call the UPC monitor process) will create a shared region with a size equal to the value given by the number of threads (i.e., THREADS) multiplied by the size of the special data segment that was created by the linker (plus any additional allocation required to support UPC's dynamic memory allocation library routines). The entire shared memory area is then mapped at a unique location in the UPC monitor process's address space. When the monitor process in turn creates the UPC application threads, they will inherit this mapping of the entire global shared data area. The shared data area mapping is used by each UPC thread to implement access to all shared data, including the shared data associated with the currently executing thread.
The use of a separate shared object linkage section, which in turn is linked into its own data segment, offers a simple method for coalescing the allocation of each UPC thread's contribution to shared memory into a single memory region. This approach eliminates the need to build a special pre-linker tool, and provides a straightforward method of matching references to UPC shared objects with the location of the shared object in the shared address space.Since each UPC thread maps the entireglobal shared data area into its address space, each UPC thread will have direct access to the shared data contribution of all other threads.
The UPC language defines several properties for pointers to shared memory objects:
· Pointers to unblocked shared objects have two logical components: the thread number (affinity), and the local address.
· Pointers to blocked shared objects have three logical components: the phase (represents the index within the block), the thread number, and the local address.
· Shared objects with affinity to a given thread can be accessed by either shared pointers or private pointers of that thread.
· Elements of shared arrays are distributed in a round-robin fashion by chunks of blocksize elements such that the I-th element has affinity with the thread whose thread index is given by the value of the expression floor(I/blocksize % THREADS).
If a pointer to an element of a shared array is recast to a local pointer (assuming that this is permitted by UPC's language rules), incrementing the local pointer, will point to the next (array) element with the same affinity as the original array element.
The requirements above lead us to the following conclusions regarding the implementation of pointers to shared objects on the SGI Origin 2000 platform:
Pointers to shared objects must contain: (1) the virtual address of the data, which is interpreted as an offset into a shared data region containing each thread's contribution to UPC shared data, (2) the thread number, and (3) the phase (within a block distributed object). The diagram below shows the logical structure:
|
Phase |
Thread Number |
|
Virtual Address |
|
The semantics of pointer arithmetic (i.e., adding or subtracting an integer index to/from a pointer to shared object) are straightforward, as shown on page 4 of Introduction to UPC and Language Specification (http://projects.seas.gwu.edu/~hpcl/upcdev/upctr.pdf).
The UPC Specification, section 6.5.1 describes the behavior of UPC's synchronization statements:
· Each thread shall execute an alternating sequence of upc_notify and upc_wait statements, starting with a upc_notify and ending with a upc_wait statement. A synchronization phase consists of the execution of all statements between one upc_notify and the next.
· A upc_wait statement does not complete until all threads have executed the upc_notify statement that begins the synchronization phase. Note that this implies that all threads are in the same synchronization phase as they complete the upc_wait statement.
·
The upc_barrier statement is equivalent to the
compound statement:
{
upc_notify barrier_value; upc_wait barrier_value; }
· The barrier_value is specified as expressionopt; it is optional. However, if barrier_value is supplied, the value must agree with value given by all upc_notify and all upc_wait statements with the same synchronization phase.
The UPC synchronization primitives are implemented with a minimum level of support from the operating system; they use only the compare and swap function provided by the Irix OS.
Barrier synchronization is implemented using a bit vector, where one bit is reserved for each thread. In the current implementation, a maximum of 256 threads is supported, thus the bit vector is implemented as a sequence of eight (8) thirty-two (32) bit words. Irix's compare and swap primitives are used to test and set each bit atomically. There is also a per-process barrier_id value accessible to all threads; this value is used to implement the barrier consistency checks called out in the UPC specification.
A tree-structured synchronization protocol is used to implement the barrier wait; this approach helps reduce the total number of checks required to insure that all threads are in the same synchronization phase. In this implementation, thread M waits for threads 2M and 2M+1 to set their bits in the barrier bit vector before setting its own bit. When thread 0 determines that bits 1 and 2 have been set, it clears the entire bit vector, which in turn releases all threads.
The UPC defined consistency check, utilizing a user-supplied barrier_id, is implemented in a similar fashion. The upc_notify operation writes each thread's barrier_id into a location reserved for that purpose. The upc_wait operation then ensures that thread M compares its barrier_id to that of threads 2M and 2M+1, after determining that they have set their respective bits in the barrier bit vector. If an inequality is detected, the program will terminate after printing a runtime error diagnostic.
The following description of UPC's data consistency model is excerpted from Introduction to UPC, May 13, 1999, page 6:
UPC provides a hybrid, user-controlled consistency model for the interaction of memory accesses in shared memory space. Each memory reference in the program may be annotated (using a variety of approaches) to be either strict or relaxed. Under strict behavior, the program executes in a sequential consistency model [Lamp 79]. This implies that the user can be sure that it appears to all threads that the strict references in a thread appear in the order they are written, relative to all other accesses. To implement this, the compiler must take into account all memory accesses in all threads of the program when determining that a strict reference in a thread may be reordered with respect to other shared references. Under relaxed behavior, the program executes in what we term a "local" consistency model. This implies that the user can assume that it appears to the issuing thread that all shared references in that thread occur in the order they were written. Here the compiler need only analyze shared memory access in the local thread to allow reordering. Note that because each reference may be annotated, a number of models between local and sequential consistency are available to the user.
The general method of using these models is that the programmer will first establish a default environment by including either <upc_strict.h> or <upc_relaxed.h>. Among other things, these files respectively contain a #pragma upcstrict global and a #pragma upc relaxed global directive. All unannotated references within functions defined after these directives operate under the selected model. The programmer then may annotate more explicitly those references to be handled differently. One method for annotating references is to declare shared variables and pointers with the type qualifiers strict and relaxed. All references to annotated variables and through annotated pointers will operate under the selected model, regardless of the default behavior in force. The other method is to use the #pragma upc strict next and #pragma upc relaxed next mechanisms which apply to the following statement or statement sequence and cause otherwise unqualified references in those statements to operate under the selected mode, without regard to the default behavior in force.
The next construct has been dropped from the UPC language specification, but the rest of the strict/relaxed behavior is consistent with the description above.
In the current design, UPC's data consistency model is implemented by treating all strict references as if they are equivalent to C's volatile qualifier. We believe that handling strict references in this way leads to a conforming implementation, because the global sharedmemory model exhibits an equivalence between shared memory references and local memory references when viewed in the context of C's data consistency model.
A UPC program is compiled and linked by executing the upc command. It is described below. This description below is taken from the UPC "man page":
NAME
upc -UPC compiler for parallel computers.
SYNOPSIS
upc [ option | filename ]...
DESCRIPTION
UPC is an extension to the GNU C compiler from the Free Software Foundation. In addition to the options specified here, all of the normal GCC options listed in the man pages for gcc(1) are available. The UPC compiler is integrated with the GCC compiler. UPC processes input files through one or more of four stages: preprocessing, compilation, assembly, and linking.
Suffixes of source file names indicate the language and kind of processingto be done:
|
.upc |
UPC source; preprocess, compile, assemble |
|
.upci |
Preprocessed UPC; compile, assemble |
|
.h |
Preprocessor file; not usually named on command line |
|
.c,.i,.s |
C, preprocessed C, and assembler source files; they are processed by the C compiler and assembler. |
The resulting object files can be linked with UPC source code.
Files with other suffixes are passed to the linker. Common cases include:
|
.o |
Object file |
|
.a |
Archive file |
Linking is always the last stage unless you use one of the -c, -S, or -E options to avoid linking (or unless compilation errors stop the whole process). For the link stage, all .o files corresponding to source files, -l libraries, unrecognized filenames (including named .o object files and .a archives) are passed to the linker in command-line order.
OPTIONS
All the UPC-specific options are summarized below.
Information Options
-v
Language Options
-fupc-threads-n
-x upc
Debugging Options
-g
Optimization Options
-O1, -O2, -O3
INFORMATION OPTIONS
|
-v |
Identifies the version of UPC currently in use, with a path name to a specification file that is in the same directory as include directories and other version- specific directories and files. Can be invoked without a source file name. When invoked with files, gives include and library directory paths in the order that they are searched. |
LANGUAGE OPTIONS
All source files ending in .upc or .upci will be compiled by the UPC compiler. The -x upc switch tells the compiler to process all of the following file names as UPC source code, ignoring the default language typically associated with filename extensions. ‑fupc‑threads‑n Compile for n threads. The special symbol THREADS will be set to n and can be used both in data declarations (as a constant for array dimensions) and in expressions. One ach thread the special symbol MYTHREAD refers to the thread number.
DEBUGGING OPTIONS
|
-g |
Produce symbolic debugging information |
OPTIMIZATION OPTIONS
|
-O2 |
This optimization level is one of GCC's standard options. It is especially important for UPC because it enables instruction scheduling, which increases performance dramatically for distributed data access. Nearly all supported optimizations that do not involve a space-speed tradeoff are performed. Loop unrolling and function inlining are not done, for example. |
EXECUTION (RUNTIME) OPTIONS
The number of THREADS in an UPC application can be specified statically at compile-time, or dynamically at execution time. In the static compilation environment THREADS is a constant, and can be used freely in contexts where a constant is required by the C language specification (for example in an array declaration). In a dynamic compilation environment, the value of THREADS is given at runtime, and THREADS can be used in array declarations only if the array is qualified as shared and in contexts where one and only one of the shared array's dimensions is specified as an integral multiple of THREADS.
<UPC_program>[-fupc-threads-n]
[-fupc-heap-n[K|M|G]]
[program-specific-arguments and
switches]
If the UPC program was not compiled with the -fupc-threads-n option, then the number of THREADS must be specified (either implicitly via a runtime system default, or explicitly on the command line) when the program is executed. The UPC runtime recognizes the -fupc-threads-n command line switch, and establishes the number of parallel execution threads given by the value `n'. Generally, `n' should not exceed the number of physical central processing units. The implementation-imposed maximum value of `n' is 256.
The size of the heap used by the UPC program is established with the specification of the -fupc-heap-n command line switch. The value of `n' is the size of the heap available to each thread, specified in bytes. A suffix of `K' indicates that the value `n' is expressed in kilobytes (2^10 bytes); a suffix of `M' indicates that `n' is expressed in megabytes (2^20 bytes); and `G' indicates the value is given in gigabytes (2^30 bytes). If the -fupc-heap-n switch is not supplied, then the runtime system will choose a default heap size of 1 megabyte per thread. The UPC runtime will remove all switches that begin with the prefix -fupc- and that immediately follow the UPC program name on the command line, before calling the UPC program's `main()' routine.
FILES
|
file.upc |
UPC source file |
|
file.upci |
preprocessed UPC source file |
|
file.c |
C source file |
|
file.h |
C header (preprocessor) file |
|
file.i |
preprocessed C source file |
|
file.s |
assembly language file |
|
file.o |
object file |
|
a.out |
link edited output |
|
TMPDIR/cc* |
temporary files |
|
LIBDIR/cpp |
preprocessor |
|
LIBDIR/cc1upc |
compiler for UPC |
|
LIBDIR/cc1 |
compiler for C |
|
LIBDIR/collect2 |
linker front end needed on some machines |
|
LIBDIR/libupc.a |
UPC runtime library |
|
LIBDIR/libgcc.a |
GCC subroutine library |
|
/lib/crt[01n].o |
start-up routine |
|
/lib/libc.a |
standard C library, see intro(3) |
|
/usr/include |
standard directory for #include files |
|
LIBDIR/include |
standard gcc directory for #include files |
LIBDIR should be found by using upc -v
TMPDIR comes from the environment variable TMPDIR (default /usr/tmp if available, else /tmp).
SEE ALSO
gcc(1), cpp(1), as(1), ld(1), gdb(1), adb(1), dbx(1), sdb(1).
Introduction to UPC and Language Specification
(http://projects.seas.gwu.edu/~hpcl/upcdev/upctr.pdf)
William W. Carlson et al., LLNL, CCS-TR-99-157, May 13, 1999
UPC
Language Specifications
(http://www.gwu.edu/~upc/doc/upc_specs.pdf)
Tarek A. El-Ghazawi et al, February 25, 2001
The GNU UPC Mailing
List
(http://www.gwu.edu/~upc/software/gnu-upc-ml.html)
is an electronic forum for discussing news announcements, bug reports, planned
developments, and other topics of interest to GNU UPC developers and users.
BUGS
Report bugs to gnu-upc@hermes.gwu.edu
AUTHORS
Original Implementation by Jesse M. Draper <jdraper@super.org> and William W. Carlson <wwc@super.org>. Ported to SGI Irix 6.5 and the gcc 2.95.2 baseline by Gary Funck <gary@intrepid.com> and Nenad Vukicevic <nenad@intrepid.com>.
The UPC compiler is distributed as a "patch file" (http://projects.seas.gwu.edu/~hpcl/gnu-upc/upc.1.8-gcc.2.95.2.diff.gz). This patch file lists a set of file differences between the GCC 2.95.2 baseline, and the final contents of the GNU UPC compiler source files. The beginning of the patch file lists instructions for installing the patches and building the compiler.
We recommend that you build the UPC compiler using GCC as the self-hosted C compiler. You should use gcc.2.95.2 built for Irix 6.5. Further, we recomend that you build with GNU make (version 3.79.1 or higher). In addition, bison (version 1.28 or higher) and gperf (version 2.7.2 or higher) are required, and should be installed before attempting to build UPC. Gperf sources can be ftp'd from ftp://prep.ai.mit.edu/gnu/gperf/gperf-2.7.2.tar.gz. GNU make sources can be ftp'd from ftp://prep.ai.mit.edu/gnu/make/make-3.79.1.tar.gz. Bison sources can be ftp'd from ftp://prep.ai.mit.edu/gnu/bison/bison-1.28.tar.gz. You also will need the GNU "patch" utility, version 2.5.4 or higher. The `patch' sources can be ftp'd from ftp://prep.ai.mit.edu/gnu/patch/patch-2.5.4.tar.gz
The GNU UPC compiler is built from the patch file by the following steps.
1. Untar gcc 2.95.2:
% gunzip -c < gnu/gcc-2.95.2.tar.gz | tar xpf -
2. Rename gcc-2.95.2 to gnu-upc, and cd into gnu-upc:
% mv gcc-2.95.2 gnu-upc
% cd gnu-upc
3. Apply the patches (note the options to patch):
% patch -s -E -N -p1 --set-utc -i ../upc.1.8-gcc.2.95.2.diff
4. Configure:
% configure >& config.log
5. Optionally, if you don't want to configure all the languages, then configure only for upc:
% configure --enable-languages=upc >& config.log
6. Build. We recommend that you use GNU make. We call it `gmake' here. We recommend that you do *not* attempt a "parallel make", using the -j switch (there are some problems in the make dependencies supplied with gcc 2.95.2):
% gmake >& make.log
If you want to verify that gcc isn't broken, and you have some extra time and disk space, you can run a full "bootstrap" instead (but this is typically only of interest to developers):
% gmake bootstrap >& make.log
To run the test programs:
% cd /usr/gnu-upc/upc_test
% gmake
% cd /usr/gnu-upc/upc_test/test
% run_tests
You can run the compiler without installing it, as follows (actual directory locations may vary):
% cd /usr/gnu-upc/upc_test/test
% /usr/gnu-upc/gcc/upc test01.upc -fupc-threads-4 -o test01
% ./test01
Or, you can choose to install the compiler, but keep in mind that you may need to have system administrator privileges, and you should *not* overwrite the gcc compiler currently installed on your system. To specify the installation location as other than the default, use the --prefix switch:
% configure --prefix=/usr/local/upc
You should *not* specify the source directory (/usr/gnu-upc in our example) as the installation --prefix value above. Once the compiler has been built, it can be installed:
% gmake install >& install.log
Note: if you ever need to start over from scratch (including configure), enter the following command from within the gnu-upc directory:
% gmake distclean >& clean.log
The UPC patches modify the following files that are already present in the GCC 2.95.2 baseline:
|
gcc/version.c Makefile.in configure.in config/mh-irix6 gcc/Makefile.in gcc/c-common.c gcc/c-decl.c gcc/c-lang.c gcc/c-lex.c gcc/c-lex.h gcc/c-parse.gperf |
gcc/c-parse.in gcc/c-tree.h gcc/c-typeck.c gcc/cccp.c gcc/cppinit.c gcc/crtstuff.c gcc/emit-rtl.c gcc/explow.c gcc/fold-const.c gcc/gcc.c gcc/optabs.c |
gcc/print-tree.c gcc/toplev.c gcc/tree.c gcc/tree.h gcc/varasm.c gcc/config/mips/mips.c gcc/config/mips/mips.h gcc/config/mips/x-iris6 gcc/objc/Make-lang.in gcc/objc/objc-act.c |
The UPC language is implemented as a language "dialect". Most of the language specific processing is isolated to the u directory. The u directory contains the following files:
|
gcc/u/Make-lang.in gcc/u/Makefile.in gcc/u/config-lang.in gcc/u/lang-specs.h gcc/u/upc-act.c |
gcc/u/upc-act.h gcc/u/upc-expr.c gcc/u/upc-expr.h gcc/u/upc-layout.c gcc/u/upc-share.c |
gcc/u/upc-share.h gcc/u/upc-stmt.c gcc/u/upc-tree.def gcc/u/upc.1 gcc/u/upc.c |
As pointed out in the source commentary, gcc/u/upc-expr.c, gcc/u/upc-expr.h, gcc/u/upc-layout.c, and gcc/u/upc-stmt.c are modified versions of the original GCC files named expr.c, layout.c, and stmt.c respectively. The UPC-specific versions of these files had to be placed into the u directory to ensure that the C compiler proper and the other language dialects still build properly.
The UPC runtime library is located in a directory called libupc. The libupc directory contains the following files:
|
libupc/Makefile.in libupc/configure libupc/configure.in libupc/upc_access.c libupc/upc_addr.c |
libupc/upc_alloc.c libupc/upc_barrier.c libupc/upc_config.h libupc/upc_defs.h libupc/upc_lock.c |
libupc/upc_main.c libupc/upc_mem.c libupc/include/upc.h libupc/include/upc_relaxed.h libupc/include/upc_strict.h |
The include sub-directory below libupc contains the user-visible header files whose contents are described in the UPC language specification. As part of the process of building the compiler, the libupc source files are compiled under a target-specific directory named mips-sgi-irix6.5/libupc, where the UPC library archive named libupc.a is created. This library, which implements all non-compiler-generated UPC runtime support, is then linked with the UPC application program.
A set of test programs is also included with the UPC compiler release. The test programs can be found in subdirectories under the directory named upc_test. The test directory contains approximately ten very simple programs, which can be run as an initial "smoke test" to verify that the UPC compiler is operational. The gmu test suite is generally more thorough and complete. The GMU test suite is described by a document located on the UPC Developers site (http://www.gwu.edu/~upc/implementor.html) hosted at George Washington University, (registration may be required in order to access this site).
The SGI UPC compiler, based upon the GCC 2.95.2 source code baseline, generally meets or exceeds the level of language conformance present in the GCC 2.7.1 based UPC 3.1.9 implementation (which targets the Cray platform). Although the SGI UPC compiler passes most of the GMU test suite, the following test failures remain:
|
I_case5_I |
Test shared array of structures. |
|
I_case5_ii |
Test blocked array of structures. |
|
IV_case2_I |
Pointers arithmetic for blocked pointers |
|
V_case5_I |
Assignment of shared-pointers-to-shared to private-pointers-to-shared with different blocking factors |
In addition to passing a majority of the tests in the GMU test suite, the SGI UPC compiler correctly compiles a set of limited-distribution benchmarks, obtained for the purposes of validating the compiler. These benchmarks are not included as part of the SGI UPC compiler distribution.
As demonstrated by passing the runtime library related GMU tests, the SGI UPC compiler implements all the library functions called out in the UPC specification. Many of these functions were not defined in the language supported by the original UPC 3.1 compiler.
The SGI GNU UPC compiler is based on the earlier work of Jesse Draper (IDA) and Bill Carlson (IDA). Howard ("Flash") Gordon (NSA) and his organization helped sponsor the development work, and provided invaluable technical feedback at each stage of development. Ed Burgess (CSC) served as project manager. Gerhard Harrop (CSC) rigorously reviewed early design proposals and was a source of both technical and editorial guidance throughout the course of the project. Jody Hencin of Silicon Graphics brought the team together, and made sure that the required resources were ready when required. Jason Mader (GWU) provided feedback on installation issues. Dr. Tarek El-Ghazawi (GWU and GMU) was instrumental in the effort to refine the UPC specification, and patiently helped us gain an understanding of the UPC language definition. Brian Wibecan (Compaq), both in his participation on the UPC Developer's List and off, provided valuable insight into the more difficult aspects of the UPC language and its implementation. Ludovic Courtès (GWU) worked tirelessly and with good humor on improvements to both the appearance and the content of the information on the UPC web site. Ludovic also independently validated the installation of the compiler, and ran various conformance tests.