UPC++ 2021.9.0 Release Announcement

18 views
Skip to first unread message

Paul H. Hargrove

unread,
Sep 30, 2021, 10:08:37 PM9/30/21
to UPC++, upcxx-a...@lbl.gov

The Pagoda project at Lawrence Berkeley National Laboratory is proud to announce the release of UPC++ 2021.9.0, now available from upcxx.lbl.gov.  This release introduces numerous enhancements and usability improvements, as detailed in the ChangeLog and copied below. 


Please use the issue tracker to report any problems or make feature requests.  Alternatively, if you have private feedback or questions not suited to a public venue, you can email: pag...@lbl.gov. We welcome all feedback.


We would like users of NERSC, ALCF and OLCF systems to be aware that we maintain public installs of UPC++ at all three centers, with usage instructions here. The 2021.9.0 release will be installed on Cori, Theta and Summit next week. 


-Paul H. Hargrove, on behalf of the Pagoda project at LBNL



ChangeLog excerpts for this release:


Improvements to on-node communication:

This release features a number of synergistic optimizations that streamline interprocess communication operations that are satisfied on-node using shared memory bypass. 

  • New as_eager_future(), as_defer_future(), as_eager_promise(), and as_defer_promise() calls for requesting eager or deferred notification of future and promise completions.

  • Existing as_future() and as_promise() calls now default to eager notification for improved performance.

  • New UPCXX_DEFER_COMPLETION macro for controlling whether as_future() and as_promise() request eager or deferred notification (see implementation-defined.md for details).

  • New overloads of fetching atomics that avoid overheads of non-empty futures and promises.

  • Performance improvements to contiguous RMA (rput, rget) using shared-memory bypass.

  • Performance improvements to upcxx::copy(), especially for cases amenable to shared-memory bypass optimizations and/or not involving device memory.

  • Performance improvements to global_ptr localization queries and operations, especially for smp-conduit.

General features/enhancements:

  • upcxx::rpc and upcxx::rpc_ff calls that encounter shared heap exhaustion while allocating internal buffers will now throw an exception instead of crashing. For details, see implementation-defined.md

  • New team::create factory constructs teams with less communication than team::split when each participant can enumerate the membership of its own new team.

  • The following future operations are now permitted before UPC++ initialization: make_future(), to_future(), when_all(), assignment and copy/move constructors.

  • Added implementation-defined macros UPCXX_ASSERT and UPCXX_ASSERT_ALWAYS

  • New UPCXX_KIND_CUDA feature macro indicates the presence of CUDA support.

  • Improve error reporting on failure to open a cuda_device.

  • Add debug codemode checking for exceptions thrown out of user callbacks into library code, which is prohibited by the specification.

  • Notable GASNet performance improvements for InfiniBand (ibv) network.

  • bench/cuda_microbenchmark performance test expanded and improved

  • The "NVIDIA HPC SDK" (or "nvhpc") compiler family is now supported on x86_64 and ppc64le hosts for version 20.9 and newer.

  • Intel oneAPI compilers v2021.1.2+ are now supported on x86_64 hosts.



Download filenames and their md5 checksums:


    upcxx-2021.9.0.tar.gz        8077db274ef231356c23785d0b2ae30a

    upcxx-spec-2021.9.0.pdf      3ef4810f38b402c5e999d44c5d7e3c5a

    upcxx-guide-2021.9.0.pdf     a0812cde2198b8d3219ebc5160a95792


Reply all
Reply to author
Forward
0 new messages