Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid default-init overhead for lookup map in assemble_pre #122

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

mjacobse
Copy link
Collaborator

With e3b3545, std::vector started to take care of the memory for the lookup map in assemble_pre to avoid having to call allocate and deallocate manually. But std::vector also default initializes all values on construction, meaning that this change introduced an overhead of always zeroing the map values initially.

Indeed it does not need to be zero'd, as all used values are set upon building it. As it needs to be of size n to be able to map any row index, zeroing it can actually be somewhat costly, especially since it is repeated for each assemble_pre. This change replaces the std::vector with std::unique_ptr to avoid the repeated unnecessary zeroing but still keep the implementation RAII.

Here and in the plots below are some benchmarks with the same setup as in #119, except that repetitions for a matrix are run at different times and on random cores to try to highlight the baseline noise. For most test matrices I see no measureable change, but for a few larger matrices (especially the DIMACS10 group) there are small but significant performance improvements.

Note that e3b3545 only changed assemple_pre, so this only deals with that function too. assemble_post does not use an RAII solution yet. If desired it should be possible to introduce std::unique_ptr for it as well, supposedly without any influence on performance.

benchmark_comparison_all
benchmark_comparison_DIMACS10

With e3b3545 std::vector started to take care of the memory for the
lookup map in assemble_pre to avoid having to call allocate and
deallocate manually. Unfortunately, std::vector also default initializes
all values on construction, meaning that this change introduced an
overhead of always zeroing the map values initially.

Indeed it does not need to be zero'd, as all used values are set upon
building it. As it needs to be of size n to be able to map any row
index, zeroing it can actually be quite costly, especially since it is
repeated for each assemble_pre. Very noticeable for example on
DIMACS10/tx2010 from SuiteSparse matrix collection.

This replaces the std::vector with std::unique_ptr to avoid the repeated
unnecessary zeroing but still keep the implementation RAII.
@mjacobse
Copy link
Collaborator Author

mjacobse commented Jul 14, 2023

Hm, I am unable to reproduce the pipeline failure for gcc-9, any way to get more information? Edit: Oh, it's the unrelated ssmfe_ciface_test test that fails. Is this common to happen from time to time?

@jfowkes
Copy link
Contributor

jfowkes commented Jul 14, 2023

@mjacobse I’ve tried a re-run, sometimes the VM hardware is a bit flaky… (and SSMFE is not the most reliable).

@jfowkes jfowkes self-requested a review July 17, 2023 13:07
Copy link
Contributor

@jfowkes jfowkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @mjacobse

@jfowkes jfowkes merged commit 130b2cb into ralna:master Jul 17, 2023
4 checks passed
@mjacobse mjacobse deleted the assemble_map_nozero branch July 17, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants