Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Euclidean distance transform #406

Merged
merged 4 commits into from
Nov 23, 2022

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Sep 6, 2022

This PR gives a further 2-4x speedup to the 2D and 3D distance transform functions by creating dedicated elementwise CUDA kernels for the pre- and post-processing steps. These include, for example, packing and upacking the 3D coordinates into a single integer. This resolves various "TODO" comments that were in the source, but I don't think we have an issue open for this.

Aside from the performance benefit there is also a substantial reduction in memory usage for the default case of return_distance=True, return_indices=False as in this case the explicit arrays of indices never need to be created.

Benchmarks

Implementation from release 22.08.01

shape return dist. return ind. SciPy ITK cuCIM accel. (CuPy vs SciPy) accel. (CuPy vs ITK)
256, 256 False True 0.00379 ---- 0.00081 4.66 ----
256, 256 True False 0.00428 0.00383 0.00102 4.21 3.76
256, 256 True True 0.00429 ---- 0.00107 4.02 ----
512, 512 False True 0.01724 ---- 0.00091 18.97 ----
512, 512 True False 0.01866 0.00788 0.00104 17.94 7.58
512, 512 True True 0.01876 ---- 0.00107 17.48 ----
2048, 2048 False True 0.35053 ---- 0.00256 137.03 ----
2048, 2048 True False 0.39468 0.08220 0.00288 137.02 28.54
2048, 2048 True True 0.39356 ---- 0.00296 133.04 ----
4096, 4096 False True 1.57430 ---- 0.00736 213.85 ----
4096, 4096 True False 1.79513 0.54968 0.00858 209.34 64.10
4096, 4096 True True 1.81570 ---- 0.00889 204.25 ----
32, 32, 32 False True 0.00319 ---- 0.00069 4.62 ----
32, 32, 32 True False 0.00356 0.00481 0.00096 3.71 5.01
32, 32, 32 True True 0.00360 ---- 0.00103 3.49 ----
128, 128, 128 False True 0.27901 ---- 0.00131 213.20 ----
128, 128, 128 True False 0.31013 0.04504 0.00156 199.32 28.94
128, 128, 128 True True 0.31289 ---- 0.00163 192.07 ----
256, 256, 256 False True 3.02435 ---- 0.00696 434.25 ----
256, 256, 256 True False 3.34077 0.33528 0.00855 390.57 39.20
256, 256, 256 True True 3.32920 ---- 0.00915 363.89 ----
384, 384, 384 False True 10.68973 ---- 0.02284 468.07 ----
384, 384, 384 True False 11.73340 1.16172 0.02809 417.65 41.35
384, 384, 384 True True 11.68002 ---- 0.03007 388.40 ----

Proposed Implementation

shape return dist. return ind. SciPy ITK cuCIM accel. (CuPy vs SciPy) accel. (CuPy vs ITK)
256, 256 False True 0.00385 N/A 0.00021 18.49 N/A
256, 256 True False 0.00433 0.00370 0.00021 20.84 17.83
256, 256 True True 0.00436 N/A 0.00022 19.55 N/A
512, 512 False True 0.01724 N/A 0.00030 58.44 N/A
512, 512 True False 0.01893 0.00790 0.00029 66.42 27.73
512, 512 True True 0.01867 N/A 0.00030 62.65 N/A
2048, 2048 False True 0.33893 N/A 0.00154 220.25 N/A
2048, 2048 True False 0.40912 0.08219 0.00154 266.11 53.46
2048, 2048 True True 0.39056 N/A 0.00162 241.76 N/A
4096, 4096 False True 1.58069 N/A 0.00418 377.92 N/A
4096, 4096 True False 1.81860 0.55152 0.00418 435.36 132.03
4096, 4096 True True 1.80671 N/A 0.00448 402.85 N/A
32, 32, 32 False True 0.00319 N/A 0.00017 19.05 N/A
32, 32, 32 True False 0.00359 0.00514 0.00011 33.25 47.68
32, 32, 32 True True 0.00355 N/A 0.00019 18.37 N/A
128, 128, 128 False True 0.27217 N/A 0.00059 458.70 N/A
128, 128, 128 True False 0.31409 0.03384 0.00050 633.12 68.21
128, 128, 128 True True 0.31207 N/A 0.00066 474.26 N/A
256, 256, 256 False True 3.01718 N/A 0.00304 993.26 N/A
256, 256, 256 True False 3.32660 0.28979 0.00235 1418.07 123.53
256, 256, 256 True True 3.31874 N/A 0.00352 942.49 N/A
384, 384, 384 False True 10.67328 N/A 0.01000 1067.34 N/A
384, 384, 384 True False 11.69219 1.11274 0.00771 1517.46 144.42
384, 384, 384 True True 11.76073 N/A 0.01161 1013.03 N/A

@grlee77 grlee77 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 6, 2022
@grlee77 grlee77 added this to the v22.10.00 milestone Sep 6, 2022
@grlee77 grlee77 requested a review from a team as a code owner September 6, 2022 15:20
Copy link
Contributor

@gigony gigony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grlee77 , it looks good to me!

@grlee77
Copy link
Contributor Author

grlee77 commented Nov 23, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8fa943c into rapidsai:branch-22.12 Nov 23, 2022
rapids-bot bot pushed a commit that referenced this pull request Nov 28, 2022
…el sizes into account) (#407)

Please review/merge #406 before this one as those changes are also present here.

This PR adds support for anisotropic pixel/voxel shape to `distance_transform_edt` via the `sampling` kwarg. The changes involve just making variants of the kernels involving distance computations that do floating-point instead of integer-valued computations so that scaling by the sample spacings can be taken into account.

Authors:
  - Gregory Lee (https://github.com/grlee77)

Approvers:
  - Gigon Bae (https://github.com/gigony)

URL: #407
rapids-bot bot pushed a commit that referenced this pull request Nov 29, 2022
related to #419

These are based on the distance transform and are a faster way of computing binary morphological operations for large diameter disk or ball footprints. Unlike the sequence footprint decomposition methods, the footprint is exactly circular (spherical).

One test case `test_isotropic_erosion_spacing` will require #407 to be merged first. 

#406 will also improve the performance of this implementation

Authors:
  - Gregory Lee (https://github.com/grlee77)

Approvers:
  - Gigon Bae (https://github.com/gigony)

URL: #421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants