-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the ability to enable memory overlap check in assignment to avoid unneeded temporary memory allocation #2768
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this improvement, it's definitely a must have! I left a few comments about code organization mostly. You can fix the linter by running pre-commit run --all-files
locally.
@@ -199,6 +199,7 @@ target_link_libraries(xtensor INTERFACE xtl) | |||
|
|||
OPTION(XTENSOR_ENABLE_ASSERT "xtensor bound check" OFF) | |||
OPTION(XTENSOR_CHECK_DIMENSION "xtensor dimension check" OFF) | |||
OPTION(XTENSOR_FORCE_TEMPORARY_MEMORY_IN_ASSIGNMENTS "xtensor force the use of temporary memory when assigning instead of an automatic overlap check" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of having different assign behaviors depending on an option, when it's not for managing dependency or debugging. We should probably have some pattern to dynamically decide if we should check for memory overlap or not. This can be done in a dedicated PR, as it may not be obvious to get it done correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we keep the option for now?
1f8c179
to
4507f14
Compare
Thanks for your feedback, I made the changes. Also, can you come up with use cases when this check would not work correctly? i.e. some elements in the container aren't located between the first and last element in memory? |
Hey, @JohanMabille, just reminding you that this exists)) |
Thanks for doing it, I had totally forgotten it Oo |
Thanks! |
Currently the default behaviour of
xsemantic_base<D>::operator=(const xexpression<E>& e)
is to create a temporary copy ofe
to avoid any problems with overlapping memory. However, if the user doesn't use some obscure strides for the containers, a check if the pointers frombegin
andend
elements don't overlap is sufficient. Based on our benchmarks, depending on the size of the containers brings huge speedups for some use cases (up to90%
in case of views). This is also achievable withnoalias
, but it's hard to enforce its usage and adding a compilation flagXTENSOR_FORCE_TEMPORARY_MEMORY_IN_ASSIGNMENTS
that allows changing this behaviour is highly beneficial.I'm also attaching the benchmark results we obtained:
xtensor_assign_benchmark_results.ods
Explanation behind benchmark names.
Benchmark was written as single templated function, so based on the template parameters in the benchmark name, the benchmarked operation looked as following, where both containers are 2d, with last number parameter being the size of single dimension: