Avoid default-init overhead for lookup map in assemble_pre #122
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With e3b3545,
std::vector
started to take care of the memory for the lookup map inassemble_pre
to avoid having to callallocate
anddeallocate
manually. Butstd::vector
also default initializes all values on construction, meaning that this change introduced an overhead of always zeroing the map values initially.Indeed it does not need to be zero'd, as all used values are set upon building it. As it needs to be of size
n
to be able to map any row index, zeroing it can actually be somewhat costly, especially since it is repeated for eachassemble_pre
. This change replaces thestd::vector
withstd::unique_ptr
to avoid the repeated unnecessary zeroing but still keep the implementation RAII.Here and in the plots below are some benchmarks with the same setup as in #119, except that repetitions for a matrix are run at different times and on random cores to try to highlight the baseline noise. For most test matrices I see no measureable change, but for a few larger matrices (especially the DIMACS10 group) there are small but significant performance improvements.
Note that e3b3545 only changed
assemple_pre
, so this only deals with that function too.assemble_post
does not use an RAII solution yet. If desired it should be possible to introducestd::unique_ptr
for it as well, supposedly without any influence on performance.