Skip to content
This repository has been archived by the owner on Sep 28, 2024. It is now read-only.

Fix DeepONet for CUDA #52

Merged
merged 1 commit into from
Mar 9, 2022
Merged

Fix DeepONet for CUDA #52

merged 1 commit into from
Mar 9, 2022

Conversation

yuehhua
Copy link
Collaborator

@yuehhua yuehhua commented Mar 8, 2022

Resolves #49, @Abhishek-1Bhatt you could take a look.

@yuehhua yuehhua requested a review from foldfelis March 8, 2022 15:43
@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #52 (65e0749) into master (cf8f4bd) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #52   +/-   ##
=======================================
  Coverage   93.33%   93.33%           
=======================================
  Files           6        6           
  Lines          90       90           
=======================================
  Hits           84       84           
  Misses          6        6           
Impacted Files Coverage Δ
src/DeepONet.jl 60.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf8f4bd...65e0749. Read the comment docs.

@@ -116,7 +116,7 @@ function (a::DeepONet)(x::AbstractArray, y::AbstractVecOrMat)
However, we perform the transformations by the NNs always in the first dim
so we need to adjust (i.e. transpose) one of the inputs,
which we do on the branch input here =#
return Array(branch(x)') * trunk(y)
return branch(x)' * trunk(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would lead to allocation (and hence, can affect the speed of forward pass), as typeof(x') will be LinearAlgebra.Adjoint{Float64, Matrix{Float64}} which isn't a concrete type, by wrapping Array() around it we make it concrete matrix type. It was introduced recently in #45 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> using LinearAlgebra

julia> isconcretetype(LinearAlgebra.Adjoint{Float64, Matrix{Float64}})
true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you did in #45 is to bring parametric datatype to DeepONet, which is helpful. But here, the function function (a::DeepONet)(x::AbstractArray, y::AbstractVecOrMat) is despetched depends on different type of x and y, which are concrete. And the return type is also depends on the despetched type as will, so there is no type unstable problem 😃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of Array blocks the train from CUDA, and this is the root cause this example could not run on CUDA.

This change would lead to allocation (and hence, can affect the speed of forward pass)

What you said is not true, instead, Array allocates new memory and slow down forward pass in this model. Taking off Array should be nice to this model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, might be because I was looking at the cpu performance, anyways there wasn't a huge difference either way and if it causing it to fail on GPU, then it should surely be removed 😊

Comment on lines -1 to +6
function train_don()
# if has_cuda()
# @info "CUDA is on"
# device = gpu
# CUDA.allowscalar(false)
# else
function train_don(; n=300, cuda=true, learning_rate=0.001, epochs=400)
if cuda && has_cuda()
@info "Training on GPU"
device = gpu
else
@info "Training on CPU"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the same style?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep the example style close to Flux-style or style in model-zoo. Should be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'll make other examples follow the same style after this PR is merged.

@foldfelis
Copy link
Contributor

Looks good to me

@foldfelis foldfelis merged commit 93cfd2d into SciML:master Mar 9, 2022
@yuehhua
Copy link
Collaborator Author

yuehhua commented Mar 9, 2022

Type stable

julia> using NeuralOperators

julia> using Flux

julia> using CUDA

julia> batch_size = 2
2

julia> a = [0.83541104, 0.83479851, 0.83404712, 0.83315711, 0.83212979, 0.83096755,
                    0.82967374, 0.82825263, 0.82670928, 0.82504949, 0.82327962, 0.82140651,
                    0.81943734, 0.81737952, 0.8152405, 0.81302771];

julia> a = repeat(a, outer=(1, batch_size)) |> gpu;

julia> sensors = collect(range(0, 1, length=16)');

julia> sensors = repeat(sensors, outer=(batch_size, 1)) |> gpu;

julia> model = DeepONet((16, 22, 30), (2, 16, 24, 30), σ, tanh;
                   init_branch=Flux.glorot_normal, bias_trunk=false) |> gpu
DeepONet with
branch net: (Chain(Dense(16, 22, σ), Dense(22, 30, σ)))
Trunk net: (Chain(Dense(2, 16, tanh; bias=false), Dense(16, 24, tanh; bias=false), Dense(24, 30, tanh; bias=false)))

julia> y = model(a, sensors);

julia> @code_warntype model(a, sensors)
MethodInstance for (::DeepONet{Chain{Tuple{Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}}}})(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
  from (a::DeepONet)(x::AbstractArray, y::AbstractVecOrMat) in NeuralOperators at /media/yuehhua/Workbench/workspace/NeuralOperators.jl/src/DeepONet.jl:111
Arguments
  a::DeepONet{Chain{Tuple{Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}}}}
  x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
  y::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
Locals
  trunk::Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}}}
  branch::Chain{Tuple{Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}
Body::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
1%1 = Base.getproperty(a, :branch_net)::Chain{Tuple{Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(σ), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}
│   %2 = Base.getproperty(a, :trunk_net)::Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}, Dense{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Flux.Zeros}}}
│        (branch = %1)
│        (trunk = %2)
│   %5 = (branch)(x)::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}%6 = NeuralOperators.:var"'"(%5)::LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}%7 = (trunk)(y)::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}%8 = (%6 * %7)::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
└──      return %8

@yuehhua yuehhua deleted the deeponet branch March 17, 2022 03:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Burgers example for DeepONet is not working
3 participants