Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to pending Enzymecore changes #519

Merged
merged 7 commits into from
Sep 19, 2024
Merged

Adapt to pending Enzymecore changes #519

merged 7 commits into from
Sep 19, 2024

Conversation

wsmoses
Copy link
Collaborator

@wsmoses wsmoses commented Sep 16, 2024

No description provided.

@@ -253,7 +255,7 @@ function gpu_rev(
end

function EnzymeRules.augmented_primal(
config::Config,
config::RevConfig,
func::Const{<:Kernel},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need those kinds of changes with an abstract Config type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forward mode rules changed fairly fundamentally, so we'll definitely still need these changes

Copy link
Contributor

github-actions bot commented Sep 18, 2024

Benchmark Results

main e4d702f... main/e4d702f77a8608...
saxpy/default/Float16/1024 2.79 ± 0.19 μs 2.77 ± 0.2 μs 1.01
saxpy/default/Float16/1048576 2.09 ± 0.0076 ms 2.08 ± 0.0076 ms 1
saxpy/default/Float16/16384 0.0328 ± 0.00014 ms 0.0328 ± 0.00015 ms 0.999
saxpy/default/Float16/2048 5.23 ± 0.045 μs 5.21 ± 0.052 μs 1
saxpy/default/Float16/256 0.967 ± 0.12 μs 0.993 ± 0.12 μs 0.974
saxpy/default/Float16/262144 0.525 ± 0.0095 ms 0.524 ± 0.0095 ms 1
saxpy/default/Float16/32768 0.065 ± 0.00019 ms 0.065 ± 0.00018 ms 1
saxpy/default/Float16/4096 10.1 ± 0.051 μs 10.1 ± 0.06 μs 0.998
saxpy/default/Float16/512 1.57 ± 0.14 μs 1.57 ± 0.16 μs 0.999
saxpy/default/Float16/64 0.614 ± 0.016 μs 0.619 ± 0.016 μs 0.992
saxpy/default/Float16/65536 0.129 ± 0.00038 ms 0.129 ± 0.00041 ms 1
saxpy/default/Float32/1024 1.03 ± 0.013 μs 1.02 ± 0.012 μs 1.01
saxpy/default/Float32/1048576 0.967 ± 0.0065 ms 0.969 ± 0.0067 ms 0.998
saxpy/default/Float32/16384 15.5 ± 0.14 μs 15.5 ± 0.14 μs 1
saxpy/default/Float32/2048 1.72 ± 0.026 μs 1.72 ± 0.025 μs 1
saxpy/default/Float32/256 0.531 ± 0.12 μs 0.529 ± 0.13 μs 1
saxpy/default/Float32/262144 0.238 ± 0.0095 ms 0.238 ± 0.0096 ms 1
saxpy/default/Float32/32768 30.4 ± 0.21 μs 30.5 ± 0.22 μs 0.999
saxpy/default/Float32/4096 3.01 ± 0.027 μs 3.03 ± 0.03 μs 0.996
saxpy/default/Float32/512 0.699 ± 0.12 μs 0.694 ± 0.11 μs 1.01
saxpy/default/Float32/64 0.418 ± 0.006 μs 0.41 ± 0.0056 μs 1.02
saxpy/default/Float32/65536 0.0601 ± 0.0003 ms 0.0602 ± 0.00029 ms 1
saxpy/default/Float64/1024 1.06 ± 0.021 μs 1.06 ± 0.019 μs 1
saxpy/default/Float64/1048576 1.05 ± 0.028 ms 1.04 ± 0.033 ms 1.01
saxpy/default/Float64/16384 16.4 ± 0.49 μs 16 ± 0.54 μs 1.03
saxpy/default/Float64/2048 1.75 ± 0.032 μs 1.74 ± 0.022 μs 1.01
saxpy/default/Float64/256 0.528 ± 0.011 μs 0.525 ± 0.0094 μs 1.01
saxpy/default/Float64/262144 0.252 ± 0.013 ms 0.244 ± 0.0094 ms 1.03
saxpy/default/Float64/32768 0.0324 ± 0.0011 ms 31.3 ± 0.86 μs 1.04
saxpy/default/Float64/4096 3.05 ± 0.04 μs 3.05 ± 0.04 μs 1
saxpy/default/Float64/512 0.697 ± 0.11 μs 0.706 ± 0.11 μs 0.987
saxpy/default/Float64/64 0.4 ± 0.0078 μs 0.399 ± 0.0049 μs 1
saxpy/default/Float64/65536 0.0638 ± 0.0025 ms 0.0614 ± 0.00085 ms 1.04
saxpy/static workgroup=(1024,)/Float16/1024 2.11 ± 0.21 μs 2.09 ± 0.22 μs 1.01
saxpy/static workgroup=(1024,)/Float16/1048576 0.18 ± 0.013 ms 0.179 ± 0.021 ms 1
saxpy/static workgroup=(1024,)/Float16/16384 4.34 ± 0.22 μs 4.37 ± 0.2 μs 0.993
saxpy/static workgroup=(1024,)/Float16/2048 2.14 ± 0.23 μs 2.14 ± 0.22 μs 1
saxpy/static workgroup=(1024,)/Float16/256 2.65 ± 0.042 μs 2.65 ± 0.042 μs 0.999
saxpy/static workgroup=(1024,)/Float16/262144 0.0457 ± 0.0029 ms 0.0439 ± 0.0027 ms 1.04
saxpy/static workgroup=(1024,)/Float16/32768 6.63 ± 0.34 μs 6.9 ± 0.31 μs 0.962
saxpy/static workgroup=(1024,)/Float16/4096 2.42 ± 0.032 μs 2.43 ± 0.039 μs 0.995
saxpy/static workgroup=(1024,)/Float16/512 3.16 ± 0.097 μs 3.17 ± 0.08 μs 0.997
saxpy/static workgroup=(1024,)/Float16/64 2.27 ± 0.025 μs 2.28 ± 0.028 μs 0.998
saxpy/static workgroup=(1024,)/Float16/65536 13 ± 0.73 μs 12.9 ± 0.68 μs 1
saxpy/static workgroup=(1024,)/Float32/1024 1.97 ± 0.027 μs 1.98 ± 0.03 μs 0.995
saxpy/static workgroup=(1024,)/Float32/1048576 0.281 ± 0.025 ms 0.259 ± 0.034 ms 1.08
saxpy/static workgroup=(1024,)/Float32/16384 4.85 ± 0.87 μs 4.8 ± 0.73 μs 1.01
saxpy/static workgroup=(1024,)/Float32/2048 2.29 ± 0.23 μs 2.3 ± 0.24 μs 0.997
saxpy/static workgroup=(1024,)/Float32/256 2.81 ± 0.95 μs 2.82 ± 1.6 μs 0.998
saxpy/static workgroup=(1024,)/Float32/262144 0.0669 ± 0.0058 ms 0.0592 ± 0.0083 ms 1.13
saxpy/static workgroup=(1024,)/Float32/32768 8.42 ± 1.4 μs 7.62 ± 1.1 μs 1.1
saxpy/static workgroup=(1024,)/Float32/4096 2.57 ± 0.2 μs 2.56 ± 0.22 μs 1.01
saxpy/static workgroup=(1024,)/Float32/512 2.49 ± 0.23 μs 2.5 ± 0.23 μs 0.998
saxpy/static workgroup=(1024,)/Float32/64 2.46 ± 0.057 μs 2.44 ± 0.053 μs 1.01
saxpy/static workgroup=(1024,)/Float32/65536 17.5 ± 1.6 μs 16.6 ± 1.7 μs 1.06
saxpy/static workgroup=(1024,)/Float64/1024 2.04 ± 0.028 μs 2.05 ± 0.031 μs 0.995
saxpy/static workgroup=(1024,)/Float64/1048576 0.574 ± 0.055 ms 0.649 ± 0.098 ms 0.885
saxpy/static workgroup=(1024,)/Float64/16384 8.14 ± 1.5 μs 7.72 ± 1.3 μs 1.05
saxpy/static workgroup=(1024,)/Float64/2048 2.53 ± 0.33 μs 2.54 ± 0.25 μs 0.996
saxpy/static workgroup=(1024,)/Float64/256 2.4 ± 0.05 μs 2.4 ± 0.047 μs 1
saxpy/static workgroup=(1024,)/Float64/262144 0.109 ± 0.014 ms 0.111 ± 0.013 ms 0.983
saxpy/static workgroup=(1024,)/Float64/32768 16.2 ± 2 μs 16.1 ± 2.2 μs 1.01
saxpy/static workgroup=(1024,)/Float64/4096 3.11 ± 0.34 μs 3.09 ± 0.3 μs 1.01
saxpy/static workgroup=(1024,)/Float64/512 2.39 ± 0.042 μs 2.39 ± 0.041 μs 1
saxpy/static workgroup=(1024,)/Float64/64 2.38 ± 0.086 μs 2.37 ± 0.079 μs 1
saxpy/static workgroup=(1024,)/Float64/65536 31 ± 3.4 μs 30.5 ± 3.9 μs 1.02
time_to_load 0.315 ± 0.0013 s 0.319 ± 0.0027 s 0.987

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@avik-pal
Copy link
Contributor

Can we re-trigger the CI with Enzyme 0.13 released

@avik-pal
Copy link
Contributor

needs a bump

@wsmoses
Copy link
Collaborator Author

wsmoses commented Sep 19, 2024

@maleadt okay this passes locally confirmed for me. This is probably the better place to bump first, if you are willing to give it the thumbs up

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superficially LGTM.

@wsmoses
Copy link
Collaborator Author

wsmoses commented Sep 19, 2024

Going to merge and hopefully check CUDA.jl success, then we can cut various releases if all is green

@wsmoses wsmoses merged commit bc89f91 into main Sep 19, 2024
11 of 32 checks passed
@wsmoses wsmoses deleted the ecore branch September 19, 2024 20:16
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (d9062a3) to head (e4d702f).
Report is 8 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #519   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files          7       7           
  Lines        528     528           
=====================================
  Misses       528     528           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants