Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos Master Merge PR Generator: Auto PR created to promote from master_merge_20240913_175813 branch to master #13451

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1b7df0f
Pass value_in to tpl set by universal ref
maartenarnst Aug 29, 2024
45aee72
Add unit test of ParameterList::set move semantics
maartenarnst Aug 29, 2024
ae7c617
Add rval ref version to const ref version of tpl:set rather than repl…
maartenarnst Aug 30, 2024
a134ec3
Revert to universal ref version. Add fallback
maartenarnst Sep 2, 2024
2c05ce0
use createIntegratorAdjointSensitivity
kliegeois Sep 5, 2024
7c81aae
hotfix for DoozyX/clang-format
jmlapre Sep 6, 2024
9d05eb0
Update the redirection of Tempus
kliegeois Sep 6, 2024
49ed6a0
Merge Pull Request #13409 from uliegecsm/Trilinos/tpl_set_universal_ref
trilinos-autotester Sep 7, 2024
44ab14d
Merge Pull Request #13431 from kliegeois/Trilinos/piro_finalObjective
trilinos-autotester Sep 7, 2024
7bcf457
Merge Pull Request #13430 from jmlapre/Trilinos/clang_format_hotfix
trilinos-autotester Sep 7, 2024
e3dadea
Refactor create_mirror for View of MP Vector
maartenarnst Sep 9, 2024
3b416e8
Phalanx: fix for gcc 15
rppawlo Sep 9, 2024
14f8059
Tpetra MatrixMatrix: sort for cuSparse
cwpearson Sep 4, 2024
3316e49
tpetra: check if local matrix is sorted before spgemm
cwpearson Sep 9, 2024
10d9685
Merge Pull Request #13434 from rppawlo/Trilinos/phalanx-fix-gcc-15
trilinos-autotester Sep 9, 2024
2ae26f6
IOSS: Fix shadow variable in region
gdsjaar Sep 9, 2024
dbf8aaa
Merge Pull Request #13438 from gsjaardema/Trilinos/SEACAS-fix-shadow-…
trilinos-autotester Sep 10, 2024
59bf308
Fix for templated set method of Teuchos ParameterList when value is s…
maartenarnst Sep 10, 2024
7d68e7f
Merge pull request #13424 from cwpearson/fix/13339
cwpearson Sep 10, 2024
8e5c2f4
Add config for CXX20 in a GCC container
sebrowne Aug 28, 2024
3fa9597
Merge pull request #13440 from uliegecsm/tpl_set_universal_ref-fix
bartlettroscoe Sep 11, 2024
fb9a1c3
Merge Pull Request #13442 from sebrowne/Trilinos/cxx20-gcc-container-…
trilinos-autotester Sep 11, 2024
1679404
Merge Pull Request #13433 from uliegecsm/Trilinos/stokhos_create_mirror
trilinos-autotester Sep 11, 2024
9456b9e
Panzer: fix basis values for hip unified memory
Sep 11, 2024
c0ab196
Merge Pull Request #13443 from rppawlo/Trilinos/panzer-fix-basis-valu…
trilinos-autotester Sep 11, 2024
7d9bcc6
Snapshot of kokkos.git from commit 5cb2fa30a39a73664b7508d0a514e8f8da…
ndellingwood Sep 12, 2024
d6d9323
Snapshot of kokkos-kernels.git from commit 8193f0b86adda9a19a2f11488a…
ndellingwood Sep 12, 2024
3c7be10
tpetra: update Tpetra_SUPPORTED_KOKKOS_VERSION to 4.4.1
ndellingwood Sep 12, 2024
244ca95
Merge pull request #13446 from ndellingwood/kokkos-promotion-patch-4.…
ndellingwood Sep 13, 2024
50278c0
Take and forward exec in deep copy of View of MP Vector
maartenarnst Sep 12, 2024
d5f0ec2
Trilinos: fix project actions
jhux2 Sep 13, 2024
b5af63c
Merge Pull Request #13449 from uliegecsm/Trilinos/stokhos-deepcopy-exec
trilinos-autotester Sep 13, 2024
0dee7c6
Merge pull request #13450 from trilinos/fix-project-action
jhux2 Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/clang_format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:

steps:
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- uses: DoozyX/clang-format-lint-action@d7f6a5bada32b7ea520b5918416e92997678e3fd # v0.18
- uses: DoozyX/clang-format-lint-action@c71d0bf4e21876ebec3e5647491186f8797fde31 # v0.18.2
with:
source: './packages/muelu ./packages/tempus ./packages/teko ./packages/xpetra'
exclude: './packages/tempus/examples'
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tpetra_muelu_label_to_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ jobs:
uses: srggrs/assign-one-project-github-action@65a8ddab497df42ef268001e67bbf976f8fd39e1 # 1.3.1
if: contains(github.event.label.name, 'MueLu') || contains(github.event.issue.title, 'MueLu')
with:
project: 'https://github.com/trilinos/Trilinos/projects/5'
project: 'https://github.com/orgs/trilinos/projects/8'
column_name: 'Backlog'
- name: Add to Tpetra Project
uses: srggrs/assign-one-project-github-action@65a8ddab497df42ef268001e67bbf976f8fd39e1 # 1.3.1
if: contains(github.event.label.name, 'Tpetra') || contains(github.event.issue.title, 'Tpetra')
with:
project: 'https://github.com/trilinos/Trilinos/projects/2'
project: 'https://github.com/orgs/trilinos/projects/9'
column_name: 'Needs Triage'
7 changes: 7 additions & 0 deletions packages/framework/ini-files/config-specs.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2039,6 +2039,13 @@ use PACKAGE-ENABLES|ALL
opt-set-cmake-var Trilinos_ENABLE_TrilinosFrameworkTests BOOL FORCE : OFF
opt-set-cmake-var Trilinos_ENABLE_TrilinosBuildStats BOOL FORCE : OFF

[rhel8_cxx-20-gcc-openmpi_debug_shared_no-kokkos-arch_no-asan_complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_all]
use rhel8_gcc-openmpi_debug_shared_no-kokkos-arch_no-asan_complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
use PACKAGE-ENABLES|ALL
opt-set-cmake-var Trilinos_ENABLE_TrilinosFrameworkTests BOOL FORCE : OFF
opt-set-cmake-var Trilinos_ENABLE_TrilinosBuildStats BOOL FORCE : OFF
opt-set-cmake-var CMAKE_CXX_STANDARD STRING FORCE : 20

[rhel8_gcc-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_all]
use rhel8_gcc-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
use PACKAGE-ENABLES|ALL
Expand Down
2 changes: 2 additions & 0 deletions packages/framework/ini-files/environment-specs.ini
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,8 @@ envvar-set OMPI_CXX: ${TRILINOS_DIR}/packages/kokkos/bin/nvcc_wrapper

[rhel8_gcc-openmpi]

[rhel8_cxx-20-gcc-openmpi]

[rhel8_gcc-serial]

[rhel8_aue-gcc-openmpi]
Expand Down
1 change: 1 addition & 0 deletions packages/framework/ini-files/supported-envs.ini
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ sems-gnu-8.5.0-openmpi-4.1.6-serial
cxx-20-sems-gnu-8.5.0-openmpi-4.1.6-serial
sems-gnu-8.5.0-openmpi-4.1.6-openmp
sems-intel-2021.3-sems-openmpi-4.1.6
cxx-20-gcc-openmpi

[ats2]
cuda-11.2.152-gnu-8.3.1-spmpi-rolling
14 changes: 14 additions & 0 deletions packages/kokkos-kernels/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Change Log

## [4.4.01](https://github.com/kokkos/kokkos-kernels/tree/4.4.01)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.4.00...4.4.01)

### Build System:
- Restore size_t as default offset, in Tribits builds [\#2313](https://github.com/kokkos/kokkos-kernels/pull/2313)

### Enhancements:
- Improve crs/bsr sorting performance [\#2293](https://github.com/kokkos/kokkos-kernels/pull/2293)

### Bug Fixes:
- SpAdd handle: delete sort_option getter/setter [\#2296](https://github.com/kokkos/kokkos-kernels/pull/2296)
- Improve GH action to produce release artifacts [\#2312](https://github.com/kokkos/kokkos-kernels/pull/2312)
- coo2csr: add parens to function calls [\#2318](https://github.com/kokkos/kokkos-kernels/pull/2318)

## [4.4.00](https://github.com/kokkos/kokkos-kernels/tree/4.4.00)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.3.01...4.4.00)

Expand Down
2 changes: 1 addition & 1 deletion packages/kokkos-kernels/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ SET(KOKKOSKERNELS_TOP_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})

SET(KokkosKernels_VERSION_MAJOR 4)
SET(KokkosKernels_VERSION_MINOR 4)
SET(KokkosKernels_VERSION_PATCH 0)
SET(KokkosKernels_VERSION_PATCH 1)
SET(KokkosKernels_VERSION "${KokkosKernels_VERSION_MAJOR}.${KokkosKernels_VERSION_MINOR}.${KokkosKernels_VERSION_PATCH}")

#Set variables for config file
Expand Down
1 change: 1 addition & 0 deletions packages/kokkos-kernels/master_history.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ tag: 4.2.01 date: 01/30/2024 master: f429f6ec release: bcf9854b
tag: 4.3.00 date: 04/03/2024 master: afd65f03 release: ebbf4b78
tag: 4.3.01 date: 05/07/2024 master: 1b0a15f5 release: 58785c1b
tag: 4.4.00 date: 08/08/2024 master: d1a91b8a release: 1145f529
tag: 4.4.01 date: 09/12/2024 master: 0608a337 release: 6b340287
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ auto coo2crs(DimType m, DimType n, RowViewType row, ColViewType col, DataViewTyp
// clang-format on
template <typename ScalarType, typename OrdinalType, class DeviceType, class MemoryTraitsType, typename SizeType>
auto coo2crs(KokkosSparse::CooMatrix<ScalarType, OrdinalType, DeviceType, MemoryTraitsType, SizeType> &cooMatrix) {
return coo2crs(cooMatrix.numRows(), cooMatrix.numCols(), cooMatrix.row, cooMatrix.col, cooMatrix.data);
return coo2crs(cooMatrix.numRows(), cooMatrix.numCols(), cooMatrix.row(), cooMatrix.col(), cooMatrix.data());
}
} // namespace KokkosSparse
#endif // _KOKKOSSPARSE_COO2CRS_HPP
1 change: 1 addition & 0 deletions packages/kokkos/.jenkins
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,7 @@ pipeline {
-DKokkos_ENABLE_CUDA=ON \
-DKokkos_ENABLE_CUDA_LAMBDA=ON \
-DKokkos_ENABLE_LIBDL=OFF \
-DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ENABLE_IMPL_MDSPAN=OFF \
-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF \
.. && \
Expand Down
15 changes: 15 additions & 0 deletions packages/kokkos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# CHANGELOG

## [4.4.01](https://github.com/kokkos/kokkos/tree/4.4.01)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.0.00...4.4.01)

### Features:
* Introduce new SequentialHostInit view allocation property [\#7229](https://github.com/kokkos/kokkos/pull/7229)

### Backend and Architecture Enhancements:

#### CUDA:
* Experimental support for unified memory mode (intended for Grace-Hopper etc.) [\#6823](https://github.com/kokkos/kokkos/pull/6823)

### Bug Fixes
* OpenMP: Fix issue related to the visibility of an internal symbol with shared libraries that affected `ScatterView` in particular [\#7284](https://github.com/kokkos/kokkos/pull/7284)
* Fix implicit copy assignment operators in few AVX2 masks being deleted [#7296](https://github.com/kokkos/kokkos/pull/7296)

## [4.4.00](https://github.com/kokkos/kokkos/tree/4.4.00)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.3.01...4.4.00)

Expand Down
2 changes: 1 addition & 1 deletion packages/kokkos/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ ENDIF()

set(Kokkos_VERSION_MAJOR 4)
set(Kokkos_VERSION_MINOR 4)
set(Kokkos_VERSION_PATCH 0)
set(Kokkos_VERSION_PATCH 1)
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
message(STATUS "Kokkos version: ${Kokkos_VERSION}")
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")
Expand Down
2 changes: 1 addition & 1 deletion packages/kokkos/Makefile.kokkos
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

KOKKOS_VERSION_MAJOR = 4
KOKKOS_VERSION_MINOR = 4
KOKKOS_VERSION_PATCH = 0
KOKKOS_VERSION_PATCH = 1
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)

# Options: Cuda,HIP,SYCL,OpenMPTarget,OpenMP,Threads,Serial
Expand Down
1 change: 1 addition & 0 deletions packages/kokkos/cmake/KokkosCore_config.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
#cmakedefine KOKKOS_ENABLE_CUDA_LAMBDA // deprecated
#cmakedefine KOKKOS_ENABLE_CUDA_CONSTEXPR
#cmakedefine KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC
#cmakedefine KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY
#cmakedefine KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE
#cmakedefine KOKKOS_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS
#cmakedefine KOKKOS_ENABLE_IMPL_HIP_UNIFIED_MEMORY
Expand Down
4 changes: 3 additions & 1 deletion packages/kokkos/cmake/kokkos_enable_options.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ KOKKOS_ENABLE_OPTION(CUDA_LAMBDA ${CUDA_LAMBDA_DEFAULT} "Whether to allow lambda
# resolved but we keep the option around a bit longer to be safe.
KOKKOS_ENABLE_OPTION(IMPL_CUDA_MALLOC_ASYNC ON "Whether to enable CudaMallocAsync (requires CUDA Toolkit 11.2)")
KOKKOS_ENABLE_OPTION(IMPL_NVHPC_AS_DEVICE_COMPILER OFF "Whether to allow nvc++ as Cuda device compiler")
KOKKOS_ENABLE_OPTION(IMPL_CUDA_UNIFIED_MEMORY OFF "Whether to leverage unified memory architectures for CUDA")

KOKKOS_ENABLE_OPTION(DEPRECATED_CODE_4 ON "Whether code deprecated in major release 4 is available" )
KOKKOS_ENABLE_OPTION(DEPRECATION_WARNINGS ON "Whether to emit deprecation warnings" )
KOKKOS_ENABLE_OPTION(HIP_RELOCATABLE_DEVICE_CODE OFF "Whether to enable relocatable device code (RDC) for HIP")
Expand Down Expand Up @@ -135,7 +137,7 @@ FUNCTION(check_device_specific_options)
ENDIF()
ENDFUNCTION()

CHECK_DEVICE_SPECIFIC_OPTIONS(DEVICE CUDA OPTIONS CUDA_UVM CUDA_RELOCATABLE_DEVICE_CODE CUDA_LAMBDA CUDA_CONSTEXPR CUDA_LDG_INTRINSIC)
CHECK_DEVICE_SPECIFIC_OPTIONS(DEVICE CUDA OPTIONS CUDA_UVM CUDA_RELOCATABLE_DEVICE_CODE CUDA_LAMBDA CUDA_CONSTEXPR CUDA_LDG_INTRINSIC IMPL_CUDA_UNIFIED_MEMORY)
CHECK_DEVICE_SPECIFIC_OPTIONS(DEVICE HIP OPTIONS HIP_RELOCATABLE_DEVICE_CODE)
CHECK_DEVICE_SPECIFIC_OPTIONS(DEVICE HPX OPTIONS IMPL_HPX_ASYNC_DISPATCH)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,17 @@
#endif
///@}

/// Some tests are skipped for unified memory space
#if defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
#define GTEST_SKIP_IF_UNIFIED_MEMORY_SPACE \
if constexpr (std::is_same_v<typename TEST_EXECSPACE::memory_space, \
Kokkos::CudaSpace>) \
GTEST_SKIP() << "skipping since unified memory requires additional " \
"fences";
#else
#define GTEST_SKIP_IF_UNIFIED_MEMORY_SPACE
#endif

TEST(TEST_CATEGORY, resize_realloc_no_init_dualview) {
using namespace Kokkos::Test::Tools;
listen_tool_events(Config::DisableAll(), Config::EnableKernels());
Expand Down Expand Up @@ -657,6 +668,7 @@ TEST(TEST_CATEGORY, create_mirror_no_init_dynamicview) {

TEST(TEST_CATEGORY, create_mirror_view_and_copy_dynamicview) {
GTEST_SKIP_IF_CUDAUVM_MEMORY_SPACE
GTEST_SKIP_IF_UNIFIED_MEMORY_SPACE

using namespace Kokkos::Test::Tools;
listen_tool_events(Config::DisableAll(), Config::EnableKernels(),
Expand Down
39 changes: 36 additions & 3 deletions packages/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
#include <algorithm>
#include <atomic>

//#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp>
#include <impl/Kokkos_Error.hpp>

#include <impl/Kokkos_Tools.hpp>
Expand Down Expand Up @@ -178,6 +177,29 @@ void *impl_allocate_common(const int device_id,
cudaError_t error_code = cudaSuccess;
#ifndef CUDART_VERSION
#error CUDART_VERSION undefined!
#elif defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
// This is intended for Grace-Hopper (and future unified memory architectures)
// The idea is to use host allocator and then advise to keep it in HBM on the
// device, but that requires CUDA 12.2
static_assert(CUDART_VERSION >= 12020,
"CUDA runtime version >=12.2 required when "
"Kokkos_ENABLE_IMPL_CUDA_UNIFIED_MEMORY is set. "
"Please update your CUDA runtime version or "
"reconfigure with "
"-D Kokkos_ENABLE_IMPL_CUDA_UNIFIED_MEMORY=OFF");
if (arg_alloc_size) { // cudaMemAdvise_v2 does not work with nullptr
error_code = cudaMallocManaged(&ptr, arg_alloc_size, cudaMemAttachGlobal);
if (error_code == cudaSuccess) {
// One would think cudaMemLocation{device_id,
// cudaMemLocationTypeDevice} would work but it doesn't. I.e. the order of
// members doesn't seem to be defined.
cudaMemLocation loc;
loc.id = device_id;
loc.type = cudaMemLocationTypeDevice;
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaMemAdvise_v2(
ptr, arg_alloc_size, cudaMemAdviseSetPreferredLocation, loc));
}
}
#elif (defined(KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC) && CUDART_VERSION >= 11020)
if (arg_alloc_size >= memory_threshold_g) {
error_code = cudaMallocAsync(&ptr, arg_alloc_size, stream);
Expand All @@ -190,9 +212,13 @@ void *impl_allocate_common(const int device_id,
"Kokkos::Cuda: backend fence after async malloc");
}
}
} else
} else {
error_code = cudaMalloc(&ptr, arg_alloc_size);
}
#else
error_code = cudaMalloc(&ptr, arg_alloc_size);
#endif
{ error_code = cudaMalloc(&ptr, arg_alloc_size); }

if (error_code != cudaSuccess) { // TODO tag as unlikely branch
// This is the only way to clear the last error, which
// we should do here since we're turning it into an
Expand Down Expand Up @@ -326,6 +352,9 @@ void CudaSpace::impl_deallocate(
}
#ifndef CUDART_VERSION
#error CUDART_VERSION undefined!
#elif defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
#elif (defined(KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC) && CUDART_VERSION >= 11020)
if (arg_alloc_size >= memory_threshold_g) {
Impl::cuda_device_synchronize(
Expand Down Expand Up @@ -436,8 +465,12 @@ void cuda_prefetch_pointer(const Cuda &space, const void *ptr, size_t bytes,

#include <impl/Kokkos_SharedAlloc_timpl.hpp>

#if !defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
KOKKOS_IMPL_HOST_INACCESSIBLE_SHARED_ALLOCATION_RECORD_EXPLICIT_INSTANTIATION(
Kokkos::CudaSpace);
#else
KOKKOS_IMPL_SHARED_ALLOCATION_RECORD_EXPLICIT_INSTANTIATION(Kokkos::CudaSpace);
#endif
KOKKOS_IMPL_SHARED_ALLOCATION_RECORD_EXPLICIT_INSTANTIATION(
Kokkos::CudaUVMSpace);
KOKKOS_IMPL_SHARED_ALLOCATION_RECORD_EXPLICIT_INSTANTIATION(
Expand Down
23 changes: 22 additions & 1 deletion packages/kokkos/core/src/Cuda/Kokkos_CudaSpace.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,19 @@ class CudaSpace {
void* allocate(const char* arg_label, const size_t arg_alloc_size,
const size_t arg_logical_size = 0) const;

#if defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
template <typename ExecutionSpace>
void* allocate(const ExecutionSpace&, const size_t arg_alloc_size) const {
return allocate(arg_alloc_size);
}
template <typename ExecutionSpace>
void* allocate(const ExecutionSpace&, const char* arg_label,
const size_t arg_alloc_size,
const size_t arg_logical_size = 0) const {
return allocate(arg_label, arg_alloc_size, arg_logical_size);
}
#endif

/**\brief Deallocate untracked memory in the cuda space */
void deallocate(void* const arg_alloc_ptr, const size_t arg_alloc_size) const;
void deallocate(const char* arg_label, void* const arg_alloc_ptr,
Expand Down Expand Up @@ -337,7 +350,11 @@ static_assert(
template <>
struct MemorySpaceAccess<Kokkos::HostSpace, Kokkos::CudaSpace> {
enum : bool { assignable = false };
enum : bool { accessible = false };
#if !defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
enum : bool{accessible = false};
#else
enum : bool { accessible = true };
#endif
enum : bool { deepcopy = true };
};

Expand Down Expand Up @@ -558,8 +575,12 @@ struct DeepCopy<HostSpace, MemSpace, ExecutionSpace,
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------

#if !defined(KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY)
KOKKOS_IMPL_HOST_INACCESSIBLE_SHARED_ALLOCATION_SPECIALIZATION(
Kokkos::CudaSpace);
#else
KOKKOS_IMPL_SHARED_ALLOCATION_SPECIALIZATION(Kokkos::CudaSpace);
#endif
KOKKOS_IMPL_SHARED_ALLOCATION_SPECIALIZATION(Kokkos::CudaUVMSpace);
KOKKOS_IMPL_SHARED_ALLOCATION_SPECIALIZATION(Kokkos::CudaHostPinnedSpace);

Expand Down
20 changes: 20 additions & 0 deletions packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,22 @@ Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default

//----------------------------------

#ifdef KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY
// Check if unified memory is available
int cuda_result;
cudaDeviceGetAttribute(&cuda_result, cudaDevAttrConcurrentManagedAccess,
cuda_device_id);
if (cuda_result == 0) {
Kokkos::abort(
"Kokkos::Cuda::initialize ERROR: Unified memory is not available on "
"this device\n"
"Please recompile Kokkos with "
"-DKokkos_ENABLE_IMPL_CUDA_UNIFIED_MEMORY=OFF\n");
}
#endif

//----------------------------------

cudaStream_t singleton_stream;
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(cuda_device_id));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&singleton_stream));
Expand Down Expand Up @@ -705,6 +721,10 @@ void Cuda::print_configuration(std::ostream &os, bool /*verbose*/) const {
#else
os << "no\n";
#endif
#ifdef KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY
os << " KOKKOS_ENABLE_IMPL_CUDA_UNIFIED_MEMORY: ";
os << "yes\n";
#endif

os << "\nCuda Runtime Configuration:\n";

Expand Down
2 changes: 2 additions & 0 deletions packages/kokkos/core/src/Kokkos_View.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -571,6 +571,8 @@ inline constexpr Kokkos::ALL_t ALL{};
#pragma omp end declare target
#endif

inline constexpr Kokkos::Impl::SequentialHostInit_t SequentialHostInit{};

inline constexpr Kokkos::Impl::WithoutInitializing_t WithoutInitializing{};

inline constexpr Kokkos::Impl::AllowPadding_t AllowPadding{};
Expand Down
2 changes: 1 addition & 1 deletion packages/kokkos/core/src/OpenMP/Kokkos_OpenMP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ int OpenMP::impl_thread_pool_size() const noexcept {
}

int OpenMP::impl_max_hardware_threads() noexcept {
return Impl::g_openmp_hardware_max_threads;
return Impl::OpenMPInternal::max_hardware_threads();
}

namespace Impl {
Expand Down
Loading
Loading