Too much memory consumption right before trying to solve a problem. #390

david-vicente · 2020-06-05T19:05:44Z

Following the suggestion by @ericphanson on this Discourse thread, I'm posting here the code that shows the issue.

using ECOS
using Convex

# Generate fake data matrix
function gen_data(m, n, k)
    return (10 * rand(m, k) *  2 *rand(k, n))
end

function gen_masks(A, holdout)
    training_mask = rand(size(A)...) .> 1 - holdout
    validation_mask = .!training_mask
    return training_mask, validation_mask
end

function alternating_minimization(A, M, Y_init, k)
    m, n = size(A)

    X = Variable(m, k)
    Y = Variable(k, n)

    objective = (norm(vec(M .* A - M .* (X*Y)), 2)
                 + γ1 * norm(vec(X), 2)
                 + γ2 * norm(vec(Y), 1))

    constraints =  [X*Y >= ϵ]

    problem = minimize(objective, constraints)

    Y.value = Y_init
    for i in 1:MAX_ITERS
        fix!(Y)
        solve!(problem, ECOS.Optimizer)
        free!(Y)
   
        fix!(X)
        solve!(problem, ECOS.Optimizer)
        free!(X)
    end

    return problem, X.value, Y.value
end

const γ1 = 1.0
const γ2 = 1.0
const ϵ = 0.0001
const MAX_ITERS = 2

m, n, k = 500, 500, 5
holdout = 0.80

A = gen_data(m, n, k)
Mt, Mv = gen_masks(A, holdout)

Y_init = rand(k,n)
_ ,X ,Y = alternating_minimization(A, Mt, Y_init, k)

This code works for example for m, n, k = 10, 10, 5, but for values like m, n, k = 500, 500, 5 it consumes all the available RAM before throwing a SIGINT.
I'm running equivalent code in CVXPY and the RAM usage is negligible.

The text was updated successfully, but these errors were encountered:

ericphanson · 2020-07-29T00:39:54Z

Thanks for the reproducible report.

For a long time, I've thought that Convex.jl's lack of type stability (e.g. in ConicObj) and possible copying in its internals causes these problems and is also quite hard to fix (and I think that's still probably true, but I'm not 100% sure; it could be issues algorithmically in how the reformulations are done). I've started a few different attempts at fixing it over the months; one is at the branch hybrid (also discussed in #388 (comment)), and I tried again today with a different approach in the branch eph/moi2 which you can try out via

using Pkg; Pkg.add(PackageSpec(name="Convex", rev="eph/moi2"))

and restarting the Julia session to pick up the new version. My new approach is to create a completely new code path, avoiding all of the old machinery, and trying to lower the problems more directly to MathOptInterface (one just calls conic_form_problem_solve(problem, solver()) instead of solve!(problem, solver) to try out this new path). I implemented enough functionality to solve your problem (but basically nothing else), and unfortunately did not improve the performance and actually slightly regressed it. I also reimplemented the problem in JuMP (perhaps not optimally since I'm not very familiar with JuMP) to compare (the code is here).

With (m, n, k) = (75, 75, 5),

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (75, 75, 5)
  2.126611 seconds (423.23 k allocations: 1.428 GiB, 20.96% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (75, 75, 5)
  2.267253 seconds (108.45 k allocations: 1.833 GiB, 21.84% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (75, 75, 5)
  1.509720 seconds (4.97 M allocations: 328.116 MiB, 4.21% gc time)

With (m, n, k) = (150, 150, 5)

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 14.768585 seconds (1.40 M allocations: 10.077 GiB, 10.20% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 15.068220 seconds (122.11 k allocations: 12.821 GiB, 7.96% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)

So that's disappointing, but I hope the new code path is easier to optimize (it's also easier to add functionality to, ref #388). I definitely still have a lot of copying since I don't use inplace modifications of the MOI.VectorAffineFunctions etc. I think there is also still some type instability. I'm not sure these are really the bottleneck though, but I have to do some more profiling to say more. For this problem, with fixed small k, surely the key is to never instantiate a full n by m matrix, and maybe I'm accidentally doing that here.

ericphanson · 2020-07-31T15:34:09Z

As an update, @blegat and @odow suggested restructuring how MOI represents VectorAffineFunctions, and with a prototype of that in my branch, I get performance matching JuMP:

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 42.958454 seconds (77.45 M allocations: 13.555 GiB, 6.44% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 13.723807 seconds (19.08 M allocations: 2.313 GiB, 1.29% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)
 14.912025 seconds (41.56 M allocations: 2.344 GiB, 2.98% gc time)

I am really starting to think this is the way to go!

odow · 2020-07-31T16:56:20Z

Nice! Probably easiest to make a PR if you want a review.

ericphanson · 2020-07-31T17:02:03Z

Ok, will do @odow!

ericphanson · 2020-07-31T17:28:56Z

PR is up here: #393

odow · 2022-02-23T02:29:39Z

Closing as duplicate of #254

ericphanson · 2024-05-07T10:03:00Z

I think #254 and this issue both had some common causes which were solved in #504, but once we got past those initial problems, #254 specifically had issues with scalar indexing (#614), and this issue had to do with Convex.jl making a giant dense matrix instead of a sparse one internally (#631). But either way, v0.16.0 should solve this once it is released.

ericphanson mentioned this issue Jul 29, 2020

geomean: Support for vectors #388

Closed

ericphanson mentioned this issue Jul 31, 2020

Dev plan #320

Closed

12 tasks

ericphanson mentioned this issue Jul 31, 2020

Proof of concept more-MOIified implementation #393

Closed

14 tasks

odow closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much memory consumption right before trying to solve a problem. #390

Too much memory consumption right before trying to solve a problem. #390

david-vicente commented Jun 5, 2020 •

edited

Loading

ericphanson commented Jul 29, 2020

ericphanson commented Jul 31, 2020

odow commented Jul 31, 2020

ericphanson commented Jul 31, 2020

ericphanson commented Jul 31, 2020

odow commented Feb 23, 2022

ericphanson commented May 7, 2024

Too much memory consumption right before trying to solve a problem. #390

Too much memory consumption right before trying to solve a problem. #390

Comments

david-vicente commented Jun 5, 2020 • edited Loading

ericphanson commented Jul 29, 2020

ericphanson commented Jul 31, 2020

odow commented Jul 31, 2020

ericphanson commented Jul 31, 2020

ericphanson commented Jul 31, 2020

odow commented Feb 23, 2022

ericphanson commented May 7, 2024

david-vicente commented Jun 5, 2020 •

edited

Loading