Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeding algorithm #407

Merged
merged 0 commits into from
Mar 6, 2024
Merged

Seeding algorithm #407

merged 0 commits into from
Mar 6, 2024

Conversation

Yhatoh
Copy link
Contributor

@Yhatoh Yhatoh commented May 23, 2023

Hi!

Again, I'm sorry for the long time that take us for doing this PR.

This PR consist in all the code necesarry to execute the seeding algorithm in Kokkos for traccc. @storrealbac and me did pretty much the same style of coding as in spacepoint binning. We separe the code in seeding_algorithm in multiple blocks of code that calls Kokkos parallel_for with the respective device functions.

The codes appears to works, but we find a bug in our code that we couldn't solve.

The main problem is that when the seeding finding calls make_prefix_sum_buff, for some reason when we do this:

const device::prefix_sum_buffer_t make_sum_result =
        device::make_prefix_sum_buffer(sizes, copy, mr);

The result of make_prefix_sum_buffer give us a result with total size 0, so we don't know we are using the memory resource in a wrong way or is something else.

For this PR, we only using host memory, as we talked about it with @krasznaa while ago.

To give details of what give us this result is with this execution:

./bin/traccc_seeding_example_kokkos --detector_file=tml_detector/trackml-detector.csv --input_directory=tml_full/ttbar_mu40/ --events=10

Cheers,
Gabriel

@guilhermeAlmeida1 guilhermeAlmeida1 self-requested a review May 23, 2023 07:31
@guilhermeAlmeida1
Copy link
Collaborator

guilhermeAlmeida1 commented May 23, 2023

The result of make_prefix_sum_buffer give us a result with total size 0, so we don't know we are using the memory resource in a wrong way or is something else.

The problem you're facing doesn't come from there but from further upstream. The spacepoint binning isn't working at the moment, as the given kernel launch parameters are not correct. Something about this:

Kokkos::parallel_for(
        "count_grid_capacities", team_policy(num_blocks, Kokkos::AUTO),
        KOKKOS_LAMBDA(const member_type& team_member) {
            Kokkos::parallel_for(
                Kokkos::TeamThreadRange(team_member, num_threads),
                ...

isn't correct. If you try, for example, printing out the globalIndex of each thread in the kernel you'll notice it gives incorrect values. They should range from 0 to the number of spacepoints instead.

PS: Additionally, I suspect you might need to call Kokkos::fence() after this kernel and before it's results are copied back to the host, but I'm not familiar with KOKKOS and wether or not that's needed.

@Yhatoh
Copy link
Contributor Author

Yhatoh commented May 23, 2023

That helps a lot, thanks @guilhermeAlmeida1

@stephenswat stephenswat added feature New feature or request kokkos Changes related to Kokkos labels May 28, 2023
@beomki-yeo
Copy link
Contributor

@Yhatoh Would you rebase or refresh the PR? Looks like that CI is clogged

@Yhatoh
Copy link
Contributor Author

Yhatoh commented Sep 7, 2023

@Yhatoh Would you rebase or refresh the PR? Looks like that CI is clogged

Sorry, I didn't saw this comment, yeah I will rebase, so I think I will close this PR for open a new one eventually. Or what do you think will be the best?

@krasznaa
Copy link
Member

So, time for some real talk...

I spent some quality time with your code today Gabriel, but this is only as far as I got. 😦 By now it does compile the seeding example successfully both with CUDA enabled and not enabled for Kokkos. But the test fails to run in both cases.

  • With CUDA support disabled in Kokkos I get:
 > ./bin/traccc_seeding_example_kokkos --input_directory=tml_full/ttbar_mu200/ --detector_file=tml_detector/trackml-detector.csv --run_cpu=1 --events=5
Running ./bin/traccc_seeding_example_kokkos tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 5

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f8d534c9c3a in __GI___wait4 (pid=603360, stat_loc=stat_loc
entry=0x7fffb3a446e8, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1  0x00007f8d534c9bfb in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7fffb3a446e8, options=options
entry=0) at waitpid.c:38
#2  0x00007f8d53438f67 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3  0x00007f8d5210e6ce in TUnixSystem::StackTrace() () from /software/root/6.28.04/x86_64-ubuntu2004-gcc9-opt/lib/libCore.so
#4  0x00007f8d5210b575 in TUnixSystem::DispatchSignals(ESignals) () from /software/root/6.28.04/x86_64-ubuntu2004-gcc9-opt/lib/libCore.so
#5  <signal handler called>
#6  0x00007f8d53d3f0aa in traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, detray::dvector, detray::djagged_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const () from /home/krasznaa/ATLAS/projects/traccc/build-nvidia/lib/libtraccc_kokkos.so.0
#7  0x00007f8d53d478c6 in traccc::kokkos::seeding_algorithm::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&) const () from /home/krasznaa/ATLAS/projects/traccc/build-nvidia/lib/libtraccc_kokkos.so.0
#8  0x00000000004072e1 in seq_run(traccc::seeding_input_config const&, traccc::common_options const&, bool) ()
#9  0x000000000040853d in main ()
===========================================================


The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f8d53d3f0aa in traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, detray::dvector, detray::djagged_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const () from /home/krasznaa/ATLAS/projects/traccc/build-nvidia/lib/libtraccc_kokkos.so.0
#7  0x00007f8d53d478c6 in traccc::kokkos::seeding_algorithm::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&) const () from /home/krasznaa/ATLAS/projects/traccc/build-nvidia/lib/libtraccc_kokkos.so.0
#8  0x00000000004072e1 in seq_run(traccc::seeding_input_config const&, traccc::common_options const&, bool) ()
#9  0x000000000040853d in main ()
===========================================================
  • With CUDA support enabled, this becomes:
 > ./bin/traccc_seeding_example_kokkos --input_directory=tml_full/ttbar_mu200/ --detector_file=tml_detector/trackml-detector.csv --run_cpu=1 --events=5
Running ./bin/traccc_seeding_example_kokkos tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 5
cudaFuncGetAttributes(&attr, func) error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/krasznaa/ATLAS/projects/traccc/build-cuda/_deps/kokkos-src/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:139
Backtrace:
[0x7f759a6946e9] Kokkos::Impl::save_stacktrace()
[0x7f759a68c3ce] Kokkos::Impl::traceback_callstack(std::ostream&)
[0x7f759a68c3ff] Kokkos::Impl::host_abort(char const*)
[0x7f759a69bc6b] Kokkos::Impl::cuda_internal_error_abort(cudaError, char const*, char const*, int)
[0x7f759a7f0f88] cudaFuncAttributes const& Kokkos::Impl::get_cuda_kernel_func_attributes<Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, false, __nv_dl_tag<vecmem::data::vector_buffer<traccc::seed> (traccc::kokkos::seed_finding::*)(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const, &(traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const), 1u>, void (unsigned long), traccc::device::seeding_global_counter*>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0u, 0u>, void (*)(Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, false, __nv_dl_tag<vecmem::data::vector_buffer<traccc::seed> (traccc::kokkos::seed_finding::*)(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const, &(traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const), 1u>, void (unsigned long), traccc::device::seeding_global_counter*>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>)>(void (* const&)(Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, false, __nv_dl_tag<vecmem::data::vector_buffer<traccc::seed> (traccc::kokkos::seed_finding::*)(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const, &(traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const), 1u>, void (unsigned long), traccc::device::seeding_global_counter*>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>))
[0x7f759a7ea3ee] traccc::kokkos::seed_finding::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&, detray::const_grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::vector, vecmem::jagged_vector, std::array, detray::tuple, traccc::internal_spacepoint<traccc::spacepoint>, false, 1u> > const&) const
[0x7f759a7f88db] traccc::kokkos::seeding_algorithm::operator()(vecmem::data::vector_view<traccc::spacepoint const> const&) const
[0x5586e44f1796] 
[0x5586e44f06a5] 
[0x7f759a126083] __libc_start_main
[0x5586e44f082e] 
Aborted (core dumped)

I'd really prefer if you could figure out what's going wrong, as I'm getting fed up with Kokkos by now. 😦

With CUDA support in Kokkos enabled, the earliest unit test fails to compile.

[100%] Building CXX object tests/kokkos/CMakeFiles/traccc_test_kokkos.dir/kokkos_basic.cpp.o
/home/krasznaa/ATLAS/projects/traccc/traccc/tests/kokkos/kokkos_basic.cpp(31): error: The enclosing parent function ("TestBody") for an extended __host__ __device__ lambda cannot have private or protected access within its class
                      (int i) { test_vec_ptr[i] = 1; });
                      ^

It seems that Kokkos lambda kernels can't be put inside of GoogleTest functions. 🤔 But I don't really understand this error to be honest.

Notice that I re-wrote the history of your branch! So probably the easiest is if you re-clone your repository to avoid conflicts with your local version of the branch.

@krasznaa krasznaa merged commit 5f0fa34 into acts-project:main Mar 6, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request kokkos Changes related to Kokkos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants