UPC++ 2022.3.0 Release Announcement

10 views

Skip to first unread message

Paul H. Hargrove

unread,

Mar 31, 2022, 12:59:33 AM3/31/22

to UPC++, upcxx-a...@lbl.gov

The Pagoda project at Lawrence Berkeley National Laboratory is proud to announce the release of UPC++ 2022.3.0, now available from upcxx.lbl.gov. This release introduces numerous enhancements and usability improvements, as detailed in the ChangeLog and copied below. Notably, this release adds support for the HPE Cray EX platform and systems with Intel Omni-Path networks.

Please use the issue tracker to report any problems or make feature requests. Alternatively, if you have private feedback or questions not suited to a public venue, you can email: pag...@lbl.gov. We welcome all feedback.

We would like users of NERSC, ALCF and OLCF systems to be aware that we maintain public installs of UPC++ at all three centers, with usage instructions here. The 2022.3.0 release will be installed on the listed systems next week.

-Paul H. Hargrove, on behalf of the Pagoda project at LBNL

ChangeLog excerpts for this release:

Improvements to GPU memory kinds:

This release features a number of synergistic improvements to the UPC++ memory kinds feature that supports efficient PGAS communication involving GPU memory buffers.

NEW: Memory kinds support for AMD GPUs using ROCm/HIP, see INSTALL.md. New configure --enable-hip flag activates new upcxx::hip_device class. This includes native offload support for upcxx::copy() using ROCmRDMA on recent InfiniBand network hardware - see GASNet-EX documentation for details.
cuda_device and hip_device are derived from new abstract base class gpu_device and device_allocator<Device> is now derived from new abstract base class heap_allocator. These help enable vendor-agnostic polymorphism in use of memory kinds.
New optional interface to GPU memory kinds simplifies startup code, e.g.:
auto gpu_alloc = make_gpu_allocator(2UL<<20);
creates a 2MB device segment with a "smart" choice of GPU, and:
auto gpu_alloc = make_gpu_allocator<hip_device>(2UL<<20, 2);
creates a device_allocator for a segment on HIP GPU number 2.
Several new members have been added to device_allocator to provide convenience and support the above improvements. See the specification for details.
These improvements are demonstrated in example/gpu_vecadd a renamed version of the cuda_vecadd kernel example which now supports either GPU vendor.

General features/enhancements: (see specification and programmer's guide for full details)

Experimental support for making RPC calls to functions in executable code segments other than the core UPC++ application, such as those in dynamic libraries. For more information, see docs/ccs-rpc.md.
Performance improvements to atomic_domain operations using shared-memory bypass.
New query upcxx::local_team_position() provides job topology information
team and atomic_domain<T> are now DefaultConstructible and have a new is_active() query
team, atomic_domain<T>, cuda_device and device_allocator<Device> are now MoveAssignable

Infrastructure changes:

NEW initial support for the HPE Cray EX platform

Complete and correct, but still untuned
Supports Slingshot-10 and Slingshot-11 NICs via GASNet-EX's experimental support for the OFI network API (aka "libfabric").
Supports PrgEnv-gnu and PrgEnv-cray.
See INSTALL.md for instructions to enable the appropriate support for this platform.

Memory kinds implementation internals have been factored and restructured, simplifying the addition of new memory kinds in future releases.

Download filenames and their md5 checksums:

upcxx-2022.3.0.tar.gz 727bb8dec3bb1094a188c272dd653d29

upcxx-spec-2022.3.0.pdf ece353daedf79de76c32a2e148b0d302

upcxx-guide-2022.3.0.pdf a73738a8f8ad9ebcb899fb8d4bf8da32

Reply all

Reply to author

Forward

0 new messages