UPC Specifications, v1.2
- Authors: UPC Consortium
- Abstract:
The UPC Language Specifications document version 1.2 published in the Lawrence Berkeley National Lab Tech Report for the purpose of citation in formal publications. For citations use: UPC Consortium, "UPC Language Specifications, v1.2", Lawrence Berkeley National Lab Tech Report LBNL-59208, 2005.
- Download this document LBNL-59208
Fast Address Translation Techniques for Distributed Shared Memory Compilers
- Authors: Cantonnet, F; El-Ghazawi, T;Lorenz, P; Gaber, Jaafer
- Abstract:
The Distributed Shared Memory (DSM) model is
designed to leverage the ease of programming of the
shared memory paradigm, while enabling the high performance
by expressing locality as in the message passing
model. Experience, however, has shown that
DSM programming languages, such as UPC, may be
unable to deliver the expected high level of performance.
Initial investigations have shown that among the major
reasons is the overhead of translating from the UPC
memory model to the target architecture virtual addresses
space, which can be very costly. Experimental
measurements have shown this overhead increasing
execution time by up to three orders of magnitude.
Previous work has also shown that some of this overhead
can be avoided by hand-tuning, which on the other hand
can significantly decrease the UPC ease of use. In
addition, such tuning can only improve the performance
of local shared accesses but not remote shared accesses.
Therefore, a new technique that resembles the Translation
Look Aside Buffers (TLBs) is proposed here. This
technique, which is called the Memory Model Translation
Buffer (MMTB) has been implemented in the GCC-UPC
compiler using two alternative strategies, full-table (FT)
and reduced-table (RT). It will be shown that the MMTB
strategies can lead to a performance boost of up to 700%,
enabling ease-of-programming while performing at a
similar performance to hand-tuned UPC and MPI codes.
- Download this document IPDPS 2005
-
An Evaluation of Global Address Space Languages: Co-Array Fortran and
Unified Parallel C
- Authors: Coarfa, C; Dotsenko, Y; Mellor-Crummey,
J ; Cantonnet, F; El-Ghazawi, T; Mohanty, A; Yao Y
- Abstract:
Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages
for single-program, multiple-data global address space programming. These
languages boost programmer productivity by providing shared variables for
communication instead of message passing. However, the performance of these
emerging languages still has room for improvement. In this paper, we study
the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several
modern architectures to identify challenges that must be met to deliver top
performance. We compare CAF and UPC variants of these programs with the original
Fortran+ MPI code. Today, CAF and UPC programs deliver scalable performance
on clusters only when written to use bulk communication. However, our experiments
uncovered some significant performance bottlenecks of UPC codes on all platforms.
We account for the root causes limiting UPC performance such as synchronization
model, communication efficiency of strided data, and source-to-source translation
issues. We show that they can be remedied with language extensions, new synchronization
constructs, and, finally, adequate optimizations by the back-end C compilers.
Download this document PPoPP 2005
-
Developing an Optimized UPC Compiler for Future Architectures
- Authors: El-Ghazawi, T; Cantonnet, F; Yao, Y; Rajamony, R
- Abstract:
UPC, or Unified Parallel C, has been gaining rising attention as a promising productive
parallel programming language that can lead to shorter time-to-solution. UPC enables
application developers to exploit both parallelism and data locality, while enjoying a
simple and easy to use syntax. This paper examines some of the early compilers that
have been developed for UPC on many different platforms. Using these developments
and previous experiments, the paper abstracts a number of considerations that must be
observed in new UPC compiler developments, such as for the future IBM PERCS1
architecture, in order to achieve high-performance without affecting the ease-of-use. In
addition to those general considerations, the paper also examines some of the interesting
features of the future architectures that can be exploited for running UPC more
efficiently.
- Download this document IBM 2005
-
Evaluation of UPC on the Cray X1
- Authors: El-Ghazawi, T; Cantonnet, F;
Yao, Y; Vetter, J
- Abstract:
UPC is parallel programming language which enables programmers to
expose parallelism and data locality in applications with an efficient syntax. Recently,
UPC has been gaining attention from vendors and users as an alternative programming
model for distributed memory applications. Therefore, it is important to understand how
such a potentially powerful language interacts with one of today’s most powerful,
contemporary architectures: the Cray X1. In this paper, we evaluate UPC on the Cray
X1 and examine how the compiler exploits the important features of this architecture
including the use of the vector processors and multi-streaming. Our experimental results
on several benchmarks, such as STREAM, RandomAccess, and selected workloads from
the NAS Parallel Benchmark suite, show that UPC can provide a high-performance,
scalable programming model, and we show users how to leverage the power of X1 for
their applications. However, we have also identified areas where compiler analysis can
be more aggressive and potential performance caveats.
- Download this document CUG
2005
Evaluating Support for Global Address Space Languages on the Cray X1
- Authors: Bell, C; Chen, W; Bonachea, D; Yelick, K
- Abstract:
The Cray X1 was recently introduced as the first in a new line of
parallel systems to combine high-bandwidth vector processing with
an MPP system architecture. Alongside capabilities such as automatic
fine-grained data parallelism through the use of vector instructions,
the X1 offers hardware support for a transparent global-address space
(GAS), which makes it an interesting target for GAS languages. In this
paper, we describe our experience with developing a portable, opensource
and high performance compiler for Unified Parallel C (UPC),
a SPMD global-address space language extension of ISO C. As part of
our implementation effort, we evaluate the X1’s hardware support for
GAS languages and provide empirical performance characterizations
in the context of leveraging features such as vectorization and global
pointers for the Berkeley UPC compiler. We discuss several difficulties
encountered in the Cray C compiler which are likely to present
challenges for many users, especially implementors of libraries and
source-to-source translators. Finally, we analyze the performance of
our compiler on some benchmark programs and show that, while there
are some limitations of the current compilation approach, the Berkeley
UPC compiler uses the X1 network more effectively than MPI or
SHMEM, and generates serial code whose vectorizability is comparable
to the original C code.
- Download this document ICS 2004
Productivity Analysis of the UPC Language
- Authors: Cantonnet, F; Yao, Y; Zahran, M; El-Ghazawi, T
- Abstract:
Parallel programming paradigms, over the past decade,
have focused on how to harness the computational power
of contemporary parallel machines. Ease of use and code
development productivity, has been a secondary goal.
Recently, however, there has been a growing interest in
understanding the code development productivity issues
and their implications for the overall time-to-solution.
The performance potential for UPC has been extensively studied
in recent research efforts.
The aim of this study, however, is to examine the impact
of UPC on programmer productivity. This paper proposes several
productivity metrics and considers a wide array of high
performance applications. The results will show that UPC
compares favorably with MPI in programmers productivity.
- Download this document IPDPS 2004 PMEO workshop
The UPC Memory Model: Problems and Prospects
- Authors: Kuchera,W; Wallace,C
- Abstract:
The memory consistency model underlying the Unified Parallel C (UPC) language remains a promising
but underused feature. This paper describes problems in the current language specification and these
results have inspired an effort in the UPC community to create an alternative memory model definition that
avoids these problems. This paper gives experimental results confirming the promise of performance gains afforded
by the memory model’s relaxed constraints on consistency.
- Download this document IPDPS 2004
Performance Analysis of the Berkeley UPC Compiler
- Authors: Chen,W; Bonachea,D; Duell,J; Husbands,P; Iancu,C; Yelick,K
- Abstract:
Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD)
model of parallelism within a global address space. The global address space is used to simplify programming,especially
on applications with irregular data structures that lead to fine-grained sharing between threads. Recent results have shown that
the performance of UPC using a commercial compiler is comparable to that of MPI. This paper describes a portable open source compiler
for UPC. The goal is to achieve a similar performance while enabling easy porting of the compiler and runtime, and also provide a
framework that allows for extensive optimizations. Some of the challenges in compiling UPC are identified. A combination of
micro-benchmarks and application kernels show that this compiler has low overhead for basic operations on shared data and
is competitive, and sometimes faster than, the commercial HP compiler. This paper also investigates several communication
optimizations, and shows significant benefits by hand-optimizing the generated code.
- Download this document ICS 2003
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
- Authors: Cantonnet,F; Yao,Y; Annareddy,S; Mohamed, AS; El-Ghazawi,T
- Abstract:
This paper considers the low-level monitoring and experimental performance evaluation of a new implementation of
the UPC compiler on the SGI Origin family of NUMA architectures. These systems offer many opportunities for the
high-performance implementation of UPC. They also offer, due to their many hardware monitoring counters, the
opportunity for low-level performance measurements to guide compiler implementations. Early, UPC compilers have
the challenge of meeting the syntax and semantics requirements of the language. As a result, such compilers tend
to focus on correctness rather than on performance.
In this work, the performance of selected applications and kernels under this new compiler is reported.
The measurements were designed to help shed some light on the next steps that should be taken by UPC compiler
developers to harness the full performance and usability potential of UPC under these architectures.
- Download this document IPDPS 2003
UPC Performance and Potential: A NPB Experimental Study
- Authors: Cantonnet,F; El-Ghazawi,T
- Abstract:
UPC, or Unified Parallel C, is a parallel extension of ANSI C. UPC follows a distributed shared memory programming
model aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation
of data locality. UPC incorporates constructs that allow placing data near the threads that manipulate them to
minimize remote accesses.
This paper gives an overview of the concepts and features of UPC and establishes, through extensive performance
measurements of NPB workloads, the viability of the UPC programming language compared to the other popular
paradigms. Further, through performance measurements we identify the challenges, the remaining steps and the
priorities for UPC.It will be shown that with proper hand tuning and optimized collective operations libraries,
UPC performance will be comparable to that of MPI. Furthermore, by incorporating such improvements into
automatic compiler optimizations, UPC will compare quite favorably to message passing in ease of programming.
- Download this document SC 2002
UPC Benchmarking Issues
- Authors: El-Ghazawi,T; Chauvin, S.
- Abstract:
UPC, or Unified Parallel C, is a parallel extension of ANSI C. UPC is developed
around the distributed shared-memory programming model with constructs that can
allow programmers to exploit memory locality, by placing data close to the
threads that manipulate them in order to minimize remote accesses. Under the
UPC memory sharing model, each thread owns a private memory and has a logical
association with a partition of the shared memory. This paper discusses an early release of UPC_Bench
a benchmark deigned to reveal UPC compilers performance weaknesses to uncover
opportunities for compiler optimizations. The experimental results from
UPC_Bench over the Compaq AlphaServer
SC will show that UPC_Bench is capable of discovering
such compiler performance problems. Further, it will show that if such
performance pitfalls are avoided through compiler optimizations, distributed
shared memory programming paradigms can result in high-performance, while the
ease of programming is enjoyed.
- Download this document ICPP 2001
Introduction to UPC and Language Specification
- Authors: Carlson,W; Draper,J.M; Culler,D.E; Yelick,K;
Brooks,E; Warren,K
- Abstract:
UPC is a parallel extension of the C programming language intended for
multiprocessors with a common global address space. A descendant of Split-C,
AC, and PCP, UPC has two primary objectives: 1) to provide efficient access to
the underlying machine, and 2) to establish a common syntax and semantics for
explicitly parallel programming in C. The quest for high performance means in
particular that UPC tries to minimize the overhead involved in communication,
among cooperating threads. When the underlying hardware enables a processor to
read and write remote memory without intervention by the remote processor (as
in the SGI/Cray T3D and T3E), UPC provides the programmer with a direct and
easy mapping from the language to low-level machine instructions. At the same
time, UPC’s parallel features can be mapped onto existing message-passing
software or onto physically shared memory to make its programs portable from
one parallel architecture to another. As a consequence, vendors who wish to
implement an explicitly parallel C could use the syntax and semantics of UPC
as a basis for a standard.
- Download this document CCS-TR-99-157
This page was last modified: