New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add cudilu #5002

Merged

atgeirr merged 1 commit into OPM:master from multitalentloes:add_cudilu

Jan 25, 2024

multitalentloes commented Nov 17, 2023 •

edited

Loading

Draft version of GPU DILU preconditioner
This DILU implementation uses graph coloring to parallelize the apply and update step of the preconditioner. Supports blocksizes up to 3, and also a float version for benchmarking.

multitalentloes force-pushed the add_cudilu branch from b3c2288 to 99e0146 Compare

November 22, 2023 13:31

multitalentloes force-pushed the add_cudilu branch 2 times, most recently from 5e8c8d8 to 4e9f09e Compare

December 20, 2023 10:09

kjetilly reviewed

View reviewed changes

Contributor

kjetilly left a comment

Generally well done, there are some small beautification fixes.

In general I'd recommend splitting up the cusparse_matrix_operations-file a bit to one only containing DILU-specific stuff.

opm/simulators/linalg/PreconditionerFactory_impl.hpp Outdated

                           using CUJac = typename Opm::cuistl::CuJac<M, Opm::cuistl::CuVector<field_type>, Opm::cuistl::CuVector<field_type>>;
                           return std::make_shared<Opm::cuistl::PreconditionerAdapter<V, V, CUJac>>(std::make_shared<CUJac>(op.getmat(), w));
                       });
+                      F::addCreator("CUDILU", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
+                          DUNE_UNUSED_PARAMETER(prm);

Contributor

kjetilly Nov 23, 2023

can use [maybe_unused]? DUNE_UNUSED_PARAMETER is not really needed anymore.

opm/simulators/linalg/PreconditionerFactory_impl.hpp Outdated

+                      });
+                      F::addCreator("CUDILUFloat", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
+                          DUNE_UNUSED_PARAMETER(prm);

Contributor

kjetilly Nov 23, 2023

can use [maybe_unused]? DUNE_UNUSED_PARAMETER is not really needed anymore.

opm/simulators/linalg/cuistl/CuDILU.cpp Outdated

+                  int globCnt = 0;
+                  for (int i = 0; i < levelSets.size(); i++) {
+                      for (size_t j = 0; j < levelSets[i].size(); j++) {
+                          res[globCnt++] = (int)levelSets[i][j];

Contributor

kjetilly Nov 23, 2023

My understanding is that this function is probably not runtime critical? Then I'd add a sanity check here for globCnt < res.size(). So something like

OPM_ERROR_IF(globCnt >= res.size(), fmt::format("Internal error. globCnt = {}, res.size() = {}", globCnt, res.size());

Contributor

kjetilly Jan 3, 2024

Also, should probably use new style cast (static_cast<int>(...))

Author

multitalentloes Jan 16, 2024

What would make globCnt be larger than the vector when we have already allocated space for every item in the sparse table?

Contributor

kjetilly Jan 17, 2024

This was more to guard against wrong use from outside of this function, ie enforce the invariant you have established. It is not strictly needed, but if someone is to change this class further down the line, they might not see the connection to this function.

opm/simulators/linalg/cuistl/CuDILU.cpp Outdated

+                  int globCnt = 0;
+                  for (int i = 0; i < levelSets.size(); i++) {
+                      for (size_t j = 0; j < levelSets[i].size(); j++) {
+                          res[levelSets[i][j]] = globCnt++;

Contributor

kjetilly Nov 23, 2023

Again here I'd add a sanity check if the function is not runtime critical (I assume not since a vector is being allocated here?)

opm/simulators/linalg/cuistl/CuDILU.cpp Outdated

+              std::vector<int>
+              createReorderedToNatural(Opm::SparseTable<size_t> levelSets)
+              {
+                  auto res = std::vector<int>(levelSets.dataSize());

Contributor

kjetilly Jan 3, 2024

dataSize() returns int, but std::vector expects size_t, this should do an explicit conversion to avoid warnings (use the to_size_t functions in detail)

opm/simulators/linalg/cuistl/detail/cusparse_matrix_operations.cu Outdated

+                  // PLS:   c += A*b
+                  // MINUS: c -= A*b
+                  template <class T, int blocksize, MVType OP>
+                  __device__ __forceinline__ void mv(T* A, T* b, T* c)

Contributor

kjetilly Jan 16, 2024 •

edited

Loading

I'm a bit dubious to this, but in principle it is ok. However, could you rename it to something slightly more meaningful (say matrixVectorProductWithAction, and then add wrapper functions that just calls this function with the correct template arguments, eg:

template<class T>
__device__ __forceinline__ void umv(T* A, T* b, T* c) {
    mv<T, MVType::PLUS>(A, b, c);
}

Here, I would use either the naming standard from Dune ie

mv ==> c = Ab
umv ==> c += Ab
mmv ==> c -= Ab

or something more verbose (eg. matrixVectorMultiply, addMatrixVector, subtractMatrixVector)

opm/simulators/linalg/cuistl/detail/cusparse_matrix_operations.cu Outdated

+                      const auto reorderedRowIdx = startIdx + blockDim.x * blockIdx.x + threadIdx.x;
+                      if (reorderedRowIdx < rowsInLevelSet + startIdx) {
+                          int naturalRowIdx = reorderedToNatural[reorderedRowIdx];
+                          size_t nnzIdx = rowIndices[reorderedRowIdx];

Contributor

kjetilly Jan 16, 2024

Both of these can be const?

opm/simulators/linalg/cuistl/detail/cusparse_matrix_operations.cu Outdated

+                          size_t nnzIdx = rowIndices[reorderedRowIdx];
+                          int diagIdx = nnzIdx;
+                          while (colIndices[diagIdx] != naturalRowIdx)

Contributor

kjetilly Jan 16, 2024

braces

opm/simulators/linalg/cuistl/detail/cusparse_matrix_operations.cu Outdated

+                                  }
+                              }
+                              int symOpposite = mid;

Contributor

kjetilly Jan 16, 2024

const

opm/simulators/linalg/cuistl/detail/cusparse_matrix_operations.cu Outdated

-                          size_t nnzIdx = rowIndices[row];
-                          size_t nnzIdxLim = rowIndices[row + 1];
+                          size_t nnzIdx = rowIndices[reorderedRowIdx];
+                          int naturalRowIdx = indexConversion[reorderedRowIdx];

Contributor

kjetilly Jan 16, 2024

const?

multitalentloes marked this pull request as ready for review

January 19, 2024 13:28

Author

multitalentloes commented Jan 19, 2024

I have now gone through the feedback and made the draft PR a regular PR. If this is given the green light I will squash everything into one commit and hopefully it will be merged. As of the only thing missing is to clear up the use of C-style arrays vs std::array for local variables in some of the kernels.

Contributor

kjetilly commented Jan 24, 2024

Jenkins build this please


          Add CUDA implementation of the DILU

4b0dd54

preconditioner. Uses graph coloring to exploit
parallelism in upper and triangular solves when
computing a diagonal approximate inverse of a
sparse matrix. Supports blocksizes up to 3.

multitalentloes force-pushed the add_cudilu branch from c157d9e to 4b0dd54 Compare

January 25, 2024 13:39

Author

multitalentloes commented Jan 25, 2024

Rebased to simplify history.

Member

atgeirr commented Jan 25, 2024

jenkins build this please

Member

atgeirr commented Jan 25, 2024

All green, merging. Thanks for the effort!

atgeirr merged commit 2626fbb into OPM:master

1 check passed

blattms mentioned this pull request

Master cannot be compiled with DUNE 2.7 and g++-11 #5595

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet