-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly use dtype when computing shared memory requirements of separable convolution #409
Correctly use dtype when computing shared memory requirements of separable convolution #409
Conversation
calling code was accidentally passing image_dtype to the unused anchor argument which caused the default cp.float32 dtype to always be used in the shared memory calculation. This could result in ptxas errors about too much shared data when the itemsize of the dtype was > than for float32.
…eption occured The ResourceLimitError can be checked very quickly, so we want to keep it as well.
Only warn on shared memory limit if algorithm='shared_memory' was explicitly requested.
rerun tests |
Thanks Greg! 🙏 Happy with merging. Only question would be if we can include a test for the error. Could happen in this PR or after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. Thank you Greg!
prior to the fix, test failures occured on my system for float64 here for the sigma values 16, 21, 26, 31
Test case has now been added (values aren't checked, it is just to verify that no compilation error occurs). I verified that without the fix in this MR, the dtype=float64 case fails for the parameterized sigmas 16, 21, 26 and 31 on my system. |
@gpucibot merge |
closes #408
An error in the keyword arguments passed to the shared memory calculation is fixed in the first commit here. In the second commit, a second fallback is added in case of any other anticipated compilation failure.