Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much memory consumption right before trying to solve a problem. #390

Closed
david-vicente opened this issue Jun 5, 2020 · 7 comments
Closed

Comments

@david-vicente
Copy link

david-vicente commented Jun 5, 2020

Following the suggestion by @ericphanson on this Discourse thread, I'm posting here the code that shows the issue.

using ECOS
using Convex

# Generate fake data matrix
function gen_data(m, n, k)
    return (10 * rand(m, k) *  2 *rand(k, n))
end

function gen_masks(A, holdout)
    training_mask = rand(size(A)...) .> 1 - holdout
    validation_mask = .!training_mask
    return training_mask, validation_mask
end

function alternating_minimization(A, M, Y_init, k)
    m, n = size(A)

    X = Variable(m, k)
    Y = Variable(k, n)

    objective = (norm(vec(M .* A - M .* (X*Y)), 2)
                 + γ1 * norm(vec(X), 2)
                 + γ2 * norm(vec(Y), 1))

    constraints =  [X*Y >= ϵ]

    problem = minimize(objective, constraints)

    Y.value = Y_init
    for i in 1:MAX_ITERS
        fix!(Y)
        solve!(problem, ECOS.Optimizer)
        free!(Y)
   
        fix!(X)
        solve!(problem, ECOS.Optimizer)
        free!(X)
    end

    return problem, X.value, Y.value
end

const γ1 = 1.0
const γ2 = 1.0
const ϵ = 0.0001
const MAX_ITERS = 2

m, n, k = 500, 500, 5
holdout = 0.80

A = gen_data(m, n, k)
Mt, Mv = gen_masks(A, holdout)

Y_init = rand(k,n)
_ ,X ,Y = alternating_minimization(A, Mt, Y_init, k)

This code works for example for m, n, k = 10, 10, 5, but for values like m, n, k = 500, 500, 5 it consumes all the available RAM before throwing a SIGINT.
I'm running equivalent code in CVXPY and the RAM usage is negligible.

@ericphanson
Copy link
Collaborator

Thanks for the reproducible report.

For a long time, I've thought that Convex.jl's lack of type stability (e.g. in ConicObj) and possible copying in its internals causes these problems and is also quite hard to fix (and I think that's still probably true, but I'm not 100% sure; it could be issues algorithmically in how the reformulations are done). I've started a few different attempts at fixing it over the months; one is at the branch hybrid (also discussed in #388 (comment)), and I tried again today with a different approach in the branch eph/moi2 which you can try out via

using Pkg; Pkg.add(PackageSpec(name="Convex", rev="eph/moi2"))

and restarting the Julia session to pick up the new version. My new approach is to create a completely new code path, avoiding all of the old machinery, and trying to lower the problems more directly to MathOptInterface (one just calls conic_form_problem_solve(problem, solver()) instead of solve!(problem, solver) to try out this new path). I implemented enough functionality to solve your problem (but basically nothing else), and unfortunately did not improve the performance and actually slightly regressed it. I also reimplemented the problem in JuMP (perhaps not optimally since I'm not very familiar with JuMP) to compare (the code is here).

With (m, n, k) = (75, 75, 5),

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (75, 75, 5)
  2.126611 seconds (423.23 k allocations: 1.428 GiB, 20.96% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (75, 75, 5)
  2.267253 seconds (108.45 k allocations: 1.833 GiB, 21.84% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (75, 75, 5)
  1.509720 seconds (4.97 M allocations: 328.116 MiB, 4.21% gc time)

With (m, n, k) = (150, 150, 5)

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 14.768585 seconds (1.40 M allocations: 10.077 GiB, 10.20% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 15.068220 seconds (122.11 k allocations: 12.821 GiB, 7.96% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)

So that's disappointing, but I hope the new code path is easier to optimize (it's also easier to add functionality to, ref #388). I definitely still have a lot of copying since I don't use inplace modifications of the MOI.VectorAffineFunctions etc. I think there is also still some type instability. I'm not sure these are really the bottleneck though, but I have to do some more profiling to say more. For this problem, with fixed small k, surely the key is to never instantiate a full n by m matrix, and maybe I'm accidentally doing that here.

@ericphanson
Copy link
Collaborator

As an update, @blegat and @odow suggested restructuring how MOI represents VectorAffineFunctions, and with a prototype of that in my branch, I get performance matching JuMP:

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 42.958454 seconds (77.45 M allocations: 13.555 GiB, 6.44% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 13.723807 seconds (19.08 M allocations: 2.313 GiB, 1.29% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)
 14.912025 seconds (41.56 M allocations: 2.344 GiB, 2.98% gc time)

I am really starting to think this is the way to go!

@odow
Copy link
Member

odow commented Jul 31, 2020

Nice! Probably easiest to make a PR if you want a review.

@ericphanson ericphanson mentioned this issue Jul 31, 2020
12 tasks
@ericphanson
Copy link
Collaborator

Ok, will do @odow!

@ericphanson
Copy link
Collaborator

PR is up here: #393

@odow
Copy link
Member

odow commented Feb 23, 2022

Closing as duplicate of #254

@odow odow closed this as completed Feb 23, 2022
@ericphanson
Copy link
Collaborator

I think #254 and this issue both had some common causes which were solved in #504, but once we got past those initial problems, #254 specifically had issues with scalar indexing (#614), and this issue had to do with Convex.jl making a giant dense matrix instead of a sparse one internally (#631). But either way, v0.16.0 should solve this once it is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants