ABSTRACT
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.
UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).
In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.
DESCRIPTION
The tutorial goals are as follows:
The topic is relevant to anyone who implements irregular applications on distributed-memory systems, including both physics-based algorithms such as adaptive mesh refinement, and non-physics-based applications such as metagenomics and graph analytics. Irregular applications are a challenge because they employ fine-grained communication. To support such applications, UPC++ provides both one-sided communication (RMA put, get and atomics) and Remote Procedure Calls (RPC). UPC++ is a C++ template library, providing the programmer with access to C++ productivity features, including the type system, standard library and lambdas. Both RMA and RPC incur low latency, thanks to the use of GASNet-EX in the UPC++ runtime implementation.
The tutorial will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through how to implement basic algorithms in UPC++. We will also look at irregular applications and how to take advantage of UPC++ features to optimize communication performance. The tutorial concludes with a brief application performance study. Advanced topics will be discussed only briefly, as the session requires no advanced knowledge of PGAS programming.
The tutorial assumes no prior knowledge of UPC++. However, participants should be comfortable with C++11 (or newer) programming, including use of templates. We also expect attendees to have some prior knowledge or experience in parallel programming.