Merge Release 1.5.0 to develop

Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))
ginkgo-project · Nov 12, 2022 · 7c9f530 · 7c9f530 · hartwiganzt · Nov 13, 2022
2 parents 1d12898 + 3055a94
commit 7c9f530
Show file tree

Hide file tree

Showing 5 changed files with 125 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,127 @@ commits. For a comprehensive list, use the following command:
 git log --first-parent
 ```
 
+
+## Version 1.5.0
+
+The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as:
+- MPI-based multi-node support for all matrix formats and most solvers;
+- full DPC++/SYCL support,
+- functionality and interface for GPU-resident sparse direct solvers,
+- an interface for wrapping solvers with scaling and reordering applied,
+- a new algebraic Multigrid solver/preconditioner,
+- improved mixed-precision support,
+- support for device matrix assembly,
+
+and much more.
+
+If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions).
+
+Supported systems and requirements:
++ For all platforms, CMake 3.13+
++ C++14 compliant compiler
++ Linux and macOS
+  + GCC: 5.5+
+  + clang: 3.9+
+  + Intel compiler: 2018+
+  + Apple LLVM: 8.0+
+  + NVHPC: 22.7+
+  + Cray Compiler: 14.0.1+
+  + CUDA module: CUDA 9.2+ or NVHPC 22.7+
+  + HIP module: ROCm 4.0+
+  + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`.
++ Windows
+  + MinGW and Cygwin: GCC 5.5+
+  + Microsoft Visual Studio: VS 2019
+  + CUDA module: CUDA 9.2+, Microsoft Visual Studio
+  + OpenMP module: MinGW or Cygwin.
+
+
+Algorithm and important feature additions:
++ Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](https://github.com/ginkgo-project/ginkgo/pull/676), [#908](https://github.com/ginkgo-project/ginkgo/pull/908), [#909](https://github.com/ginkgo-project/ginkgo/pull/909), [#932](https://github.com/ginkgo-project/ginkgo/pull/932), [#951](https://github.com/ginkgo-project/ginkgo/pull/951), [#961](https://github.com/ginkgo-project/ginkgo/pull/961), [#971](https://github.com/ginkgo-project/ginkgo/pull/971), [#976](https://github.com/ginkgo-project/ginkgo/pull/976), [#985](https://github.com/ginkgo-project/ginkgo/pull/985), [#1007](https://github.com/ginkgo-project/ginkgo/pull/1007), [#1030](https://github.com/ginkgo-project/ginkgo/pull/1030), [#1054](https://github.com/ginkgo-project/ginkgo/pull/1054), [#1100](https://github.com/ginkgo-project/ginkgo/pull/1100), [#1148](https://github.com/ginkgo-project/ginkgo/pull/1148))
++ Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](https://github.com/ginkgo-project/ginkgo/pull/896), [#924](https://github.com/ginkgo-project/ginkgo/pull/924), [#928](https://github.com/ginkgo-project/ginkgo/pull/928), [#929](https://github.com/ginkgo-project/ginkgo/pull/929), [#933](https://github.com/ginkgo-project/ginkgo/pull/933), [#943](https://github.com/ginkgo-project/ginkgo/pull/943), [#960](https://github.com/ginkgo-project/ginkgo/pull/960), [#1057](https://github.com/ginkgo-project/ginkgo/pull/1057), [#1110](https://github.com/ginkgo-project/ginkgo/pull/1110),  [#1142](https://github.com/ginkgo-project/ginkgo/pull/1142))
++ Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](https://github.com/ginkgo-project/ginkgo/pull/957), [#1058](https://github.com/ginkgo-project/ginkgo/pull/1058), [#1072](https://github.com/ginkgo-project/ginkgo/pull/1072), [#1082](https://github.com/ginkgo-project/ginkgo/pull/1082))
++ Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](https://github.com/ginkgo-project/ginkgo/pull/1059))
++ Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](https://github.com/ginkgo-project/ginkgo/pull/542), [#913](https://github.com/ginkgo-project/ginkgo/pull/913), [#980](https://github.com/ginkgo-project/ginkgo/pull/980), [#982](https://github.com/ginkgo-project/ginkgo/pull/982),  [#986](https://github.com/ginkgo-project/ginkgo/pull/986))
++ Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](https://github.com/ginkgo-project/ginkgo/pull/833), [#910](https://github.com/ginkgo-project/ginkgo/pull/910), [#926](https://github.com/ginkgo-project/ginkgo/pull/926))
++ Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](https://github.com/ginkgo-project/ginkgo/pull/904), [#973](https://github.com/ginkgo-project/ginkgo/pull/973), [#1044](https://github.com/ginkgo-project/ginkgo/pull/1044), [#1117](https://github.com/ginkgo-project/ginkgo/pull/1117))
++ Add a device_matrix_data type for device-side matrix assembly ([#886](https://github.com/ginkgo-project/ginkgo/pull/886), [#963](https://github.com/ginkgo-project/ginkgo/pull/963), [#965](https://github.com/ginkgo-project/ginkgo/pull/965))
++ Add support for mixed real/complex BLAS operations ([#864](https://github.com/ginkgo-project/ginkgo/pull/864))
++ Add a FFT LinOp for all but DPC++/SYCL ([#701](https://github.com/ginkgo-project/ginkgo/pull/701))
++ Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](https://github.com/ginkgo-project/ginkgo/pull/775))
++ Add CSR scaling ([#848](https://github.com/ginkgo-project/ginkgo/pull/848))
++ Add array::const_view and equivalent to create constant matrices from non-const data ([#890](https://github.com/ginkgo-project/ginkgo/pull/890))
++ Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](https://github.com/ginkgo-project/ginkgo/pull/901))
++ Add mixed precision SparsityCsr SpMV support ([#970](https://github.com/ginkgo-project/ginkgo/pull/970))
++ Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](https://github.com/ginkgo-project/ginkgo/pull/885), [#964](https://github.com/ginkgo-project/ginkgo/pull/964))
++ Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](https://github.com/ginkgo-project/ginkgo/pull/942))
+
+
+Deprecations and important changes:
++ Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](https://github.com/ginkgo-project/ginkgo/pull/1149)).
++ Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](https://github.com/ginkgo-project/ginkgo/pull/1101))
++ Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](https://github.com/ginkgo-project/ginkgo/pull/1031), [#1052](https://github.com/ginkgo-project/ginkgo/pull/1052))
++ Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](https://github.com/ginkgo-project/ginkgo/pull/1020))
++ Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](https://github.com/ginkgo-project/ginkgo/pull/1184)).
++ Drop official support for old CUDA < 9.2 ([#887](https://github.com/ginkgo-project/ginkgo/pull/887))
+
+
+Improved performance additions:
++ Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](https://github.com/ginkgo-project/ginkgo/pull/1013), [#1028](https://github.com/ginkgo-project/ginkgo/pull/1028))
++ Add HIP unsafe atomic option for AMD ([#1091](https://github.com/ginkgo-project/ginkgo/pull/1091))
++ Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](https://github.com/ginkgo-project/ginkgo/pull/967)).
++ Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](https://github.com/ginkgo-project/ginkgo/pull/809))
+
+
+Fixes:
++ Fix various compilation warnings ([#1076](https://github.com/ginkgo-project/ginkgo/pull/1076), [#1183](https://github.com/ginkgo-project/ginkgo/pull/1183), [#1189](https://github.com/ginkgo-project/ginkgo/pull/1189))
++ Fix issues with hwloc-related tests ([#1074](https://github.com/ginkgo-project/ginkgo/pull/1074))
++ Fix include headers for GCC 12 ([#1071](https://github.com/ginkgo-project/ginkgo/pull/1071))
++ Fix for simple-solver-logging example ([#1066](https://github.com/ginkgo-project/ginkgo/pull/1066))
++ Fix for potential memory leak in Logger ([#1056](https://github.com/ginkgo-project/ginkgo/pull/1056))
++ Fix logging of mixin classes ([#1037](https://github.com/ginkgo-project/ginkgo/pull/1037))
++ Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](https://github.com/ginkgo-project/ginkgo/pull/753))
++ Fix some matrix SpMV and conversion corner cases ([#905](https://github.com/ginkgo-project/ginkgo/pull/905), [#978](https://github.com/ginkgo-project/ginkgo/pull/978))
++ Fix uninitialized data ([#958](https://github.com/ginkgo-project/ginkgo/pull/958))
++ Fix CUDA version requirement for cusparseSpSM ([#953](https://github.com/ginkgo-project/ginkgo/pull/953))
++ Fix several issues within bash-script ([#1016](https://github.com/ginkgo-project/ginkgo/pull/1016))
++ Fixes for `NVHPC` compiler support ([#1194](https://github.com/ginkgo-project/ginkgo/pull/1194))
+
+
+Other additions:
++ Simplify and properly name GMRES kernels ([#861](https://github.com/ginkgo-project/ginkgo/pull/861))
++ Improve pkg-config support for non-CMake libraries ([#923](https://github.com/ginkgo-project/ginkgo/pull/923), [#1109](https://github.com/ginkgo-project/ginkgo/pull/1109))
++ Improve gdb pretty printer ([#987](https://github.com/ginkgo-project/ginkgo/pull/987), [#1114](https://github.com/ginkgo-project/ginkgo/pull/1114))
++ Add a logger highlighting inefficient allocation and copy patterns ([#1035](https://github.com/ginkgo-project/ginkgo/pull/1035))
++ Improved and optimized test random matrix generation ([#954](https://github.com/ginkgo-project/ginkgo/pull/954), [#1032](https://github.com/ginkgo-project/ginkgo/pull/1032))
++ Better CSR strategy defaults ([#969](https://github.com/ginkgo-project/ginkgo/pull/969))
++ Add `move_from` to `PolymorphicObject` ([#997](https://github.com/ginkgo-project/ginkgo/pull/997))
++ Remove unnecessary device_guard usage ([#956](https://github.com/ginkgo-project/ginkgo/pull/956))
++ Improvements to the generic accessor for mixed-precision ([#727](https://github.com/ginkgo-project/ginkgo/pull/727))
++ Add a naive lower triangular solver implementation for CUDA ([#764](https://github.com/ginkgo-project/ginkgo/pull/764))
++ Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](https://github.com/ginkgo-project/ginkgo/pull/897))
++ Add a L1 norm implementation ([#900](https://github.com/ginkgo-project/ginkgo/pull/900))
++ Add reduce_add for arrays ([#831](https://github.com/ginkgo-project/ginkgo/pull/831))
++ Add utility to simplify Dense View creation from an existing Dense vector ([#1136](https://github.com/ginkgo-project/ginkgo/pull/1136)).
++ Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](https://github.com/ginkgo-project/ginkgo/pull/1123))
++ Make IDR random initilization deterministic ([#1116](https://github.com/ginkgo-project/ginkgo/pull/1116))
++ Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](https://github.com/ginkgo-project/ginkgo/pull/1088))
++ Update CUDA archCoresPerSM ([#1175](https://github.com/ginkgo-project/ginkgo/pull/1116))
++ Add kernels for Csr sparsity pattern lookup ([#994](https://github.com/ginkgo-project/ginkgo/pull/994))
++ Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](https://github.com/ginkgo-project/ginkgo/pull/1027))
++ Add a binary IO format for matrix data ([#984](https://github.com/ginkgo-project/ginkgo/pull/984))
++ Add a tuple zip_iterator implementation ([#966](https://github.com/ginkgo-project/ginkgo/pull/966))
++ Simplify kernel stubs and declarations ([#888](https://github.com/ginkgo-project/ginkgo/pull/888))
++ Simplify GKO_REGISTER_OPERATION with lambdas ([#859](https://github.com/ginkgo-project/ginkgo/pull/859))
++ Simplify copy to device in tests and examples ([#863](https://github.com/ginkgo-project/ginkgo/pull/863))
++ More verbose output to array assertions ([#858](https://github.com/ginkgo-project/ginkgo/pull/858))
++ Allow parallel compilation for Jacobi kernels ([#871](https://github.com/ginkgo-project/ginkgo/pull/871))
++ Change clang-format pointer alignment to left ([#872](https://github.com/ginkgo-project/ginkgo/pull/872))
++ Various improvements and fixes to the benchmarking framework ([#750](https://github.com/ginkgo-project/ginkgo/pull/750), [#759](https://github.com/ginkgo-project/ginkgo/pull/759), [#870](https://github.com/ginkgo-project/ginkgo/pull/870), [#911](https://github.com/ginkgo-project/ginkgo/pull/911), [#1033](https://github.com/ginkgo-project/ginkgo/pull/1033), [#1137](https://github.com/ginkgo-project/ginkgo/pull/1137))
++ Various documentation improvements ([#892](https://github.com/ginkgo-project/ginkgo/pull/892), [#921](https://github.com/ginkgo-project/ginkgo/pull/921), [#950](https://github.com/ginkgo-project/ginkgo/pull/950), [#977](https://github.com/ginkgo-project/ginkgo/pull/977), [#1021](https://github.com/ginkgo-project/ginkgo/pull/1021), [#1068](https://github.com/ginkgo-project/ginkgo/pull/1068), [#1069](https://github.com/ginkgo-project/ginkgo/pull/1069), [#1080](https://github.com/ginkgo-project/ginkgo/pull/1080), [#1081](https://github.com/ginkgo-project/ginkgo/pull/1081), [#1108](https://github.com/ginkgo-project/ginkgo/pull/1108), [#1153](https://github.com/ginkgo-project/ginkgo/pull/1153), [#1154](https://github.com/ginkgo-project/ginkgo/pull/1154))
++ Various CI improvements ([#868](https://github.com/ginkgo-project/ginkgo/pull/868), [#874](https://github.com/ginkgo-project/ginkgo/pull/874), [#884](https://github.com/ginkgo-project/ginkgo/pull/884), [#889](https://github.com/ginkgo-project/ginkgo/pull/889), [#899](https://github.com/ginkgo-project/ginkgo/pull/899), [#903](https://github.com/ginkgo-project/ginkgo/pull/903),  [#922](https://github.com/ginkgo-project/ginkgo/pull/922), [#925](https://github.com/ginkgo-project/ginkgo/pull/925), [#930](https://github.com/ginkgo-project/ginkgo/pull/930), [#936](https://github.com/ginkgo-project/ginkgo/pull/936), [#937](https://github.com/ginkgo-project/ginkgo/pull/937), [#958](https://github.com/ginkgo-project/ginkgo/pull/958), [#882](https://github.com/ginkgo-project/ginkgo/pull/882), [#1011](https://github.com/ginkgo-project/ginkgo/pull/1011), [#1015](https://github.com/ginkgo-project/ginkgo/pull/1015), [#989](https://github.com/ginkgo-project/ginkgo/pull/989), [#1039](https://github.com/ginkgo-project/ginkgo/pull/1039), [#1042](https://github.com/ginkgo-project/ginkgo/pull/1042), [#1067](https://github.com/ginkgo-project/ginkgo/pull/1067), [#1073](https://github.com/ginkgo-project/ginkgo/pull/1073), [#1075](https://github.com/ginkgo-project/ginkgo/pull/1075), [#1083](https://github.com/ginkgo-project/ginkgo/pull/1083), [#1084](https://github.com/ginkgo-project/ginkgo/pull/1084), [#1085](https://github.com/ginkgo-project/ginkgo/pull/1085), [#1139](https://github.com/ginkgo-project/ginkgo/pull/1139), [#1178](https://github.com/ginkgo-project/ginkgo/pull/1178), [#1187](https://github.com/ginkgo-project/ginkgo/pull/1187))
+
+
 ## Version 1.4.0
 
 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This

diff --git a/cmake/cuda.cmake b/cmake/cuda.cmake
@@ -21,7 +21,7 @@ cas_variable_cuda_architectures(GINKGO_CUDA_ARCH_FLAGS
     ARCHITECTURES ${GINKGO_CUDA_ARCHITECTURES}
     UNSUPPORTED "20" "21")
 
-if (${CMAKE_CXX_COMPILER_ID} MATCHES "PGI|NVHPC")
+if (CMAKE_CXX_COMPILER_ID MATCHES "PGI|NVHPC")
     find_package(NVHPC REQUIRED
         HINTS
         $ENV{NVIDIA_PATH}

diff --git a/third_party/gflags/CMakeLists.txt b/third_party/gflags/CMakeLists.txt
@@ -3,7 +3,7 @@ include(FetchContent)
 FetchContent_Declare(
     gflags
     GIT_REPOSITORY https://github.com/gflags/gflags.git
-    GIT_TAG        827c769e5fc98e0f2a34c47cef953cc6328abced
+    GIT_TAG        a738fdf9338412f83ab3f26f31ac11ed3f3ec4bd
 )
 # need to set the variables in CACHE due to CMP0077
 set(GFLAGS_BUILD_TESTING OFF CACHE INTERNAL "")

diff --git a/third_party/gtest/CMakeLists.txt b/third_party/gtest/CMakeLists.txt
@@ -3,7 +3,7 @@ include(FetchContent)
 FetchContent_Declare(
     googletest
     GIT_REPOSITORY https://github.com/google/googletest.git
-    GIT_TAG        release-1.11.0
+    GIT_TAG        release-1.12.1
 )
 # need to set the variables in CACHE due to CMP0077
 set(gtest_disable_pthreads ON CACHE INTERNAL "")

diff --git a/third_party/rapidjson/CMakeLists.txt b/third_party/rapidjson/CMakeLists.txt
@@ -3,7 +3,7 @@ include(FetchContent)
 FetchContent_Declare(
     rapidjson
     GIT_REPOSITORY https://github.com/Tencent/rapidjson.git
-    GIT_TAG        48fbd8cd202ca54031fe799db2ad44ffa8e77c13
+    GIT_TAG        27c3a8dc0e2c9218fe94986d249a12b5ed838f1d
 )
 FetchContent_GetProperties(rapidjson)
 if(NOT rapidjson_POPULATED)