Add examples of Stochastic Optimal Control #429

odow · 2021-06-17T20:01:24Z

For the INFORMS talk

odow · 2021-08-15T01:35:02Z

Ideally, one of @pulsipher's examples from InfiniteOpt: https://github.com/pulsipher/InfiniteOpt.jl/tree/master/docs/src/examples

odow · 2021-08-15T02:07:04Z

If anyone reading this has nice examples of an optimal control type problem they want to share, please reach out.

pulsipher · 2021-08-16T15:52:21Z

The stochastic optimal economic control problem I recently setup with @azev77 in infiniteopt/InfiniteOpt.jl#131 (comment) (look at the conversation starting from this comment) might be a good fit. Naturally, the difference here is that you would want to implement its discrete version.

odow · 2021-08-17T08:26:48Z

Thanks for the pointer!

Here's the SDDP.jl equivalent

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

azev77 · 2021-08-17T10:57:37Z

@odow thanks!

I currently don't have Gurobi so can't replicate
In my discrete time analog to the continuous time model: c[0], c[1], ..., c[T] & B[0], B[1], ..., B[T], B[T+1]
I don't have B >= 0 (as you do)
The basic theory is that households try to smooth consumption by borrowing B[t]<0 & saving B[t]>0. Hence we expect consumption to be smoother than income/wealth (in an unconstrained world), but still stochastic.
The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

Here is my discrete time (non-stochastic, y_t=0) code:

using JuMP, Ipopt, Plots    # InfiniteOpt, 
ρ  = 0.020  # discount rate
k  = 100.0  # utility bliss point
T  = 10.0   # life horizon
r  = 0.05   # interest rate
B0 = 100.0  # endowment

time_c = 0.0:1.0:T
time_B = 0.0:1.0:T+1

u(c; k=k) = -(c - k)^2.0  # utility function
d(t; ρ=ρ) = exp(-ρ*t)     # discount function

m = Model(Ipopt.Optimizer)
register(m, :u, 1, u; autodiff = true)
register(m, :d, 1, d; autodiff = true)
@variable(m, c[time_c]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
@variable(m, B[time_B])
fix(B[0.0], B0)
fix(B[T+1], 0.0)
@NLobjective(m, Max, sum(d(t)*u(c[t]) for t in time_c) )
#@NLobjective(m, Max, sum(exp(-ρ*t)*(-1.0)*(c[t]-k)^2.0 for t in 0.0:1.0:10.0) )

for i in 2:length(time_B)
    t_1, t = time_B[i - 1], time_B[i]
    @constraint(m, B[t] == B[t_1] + 1.0 * (r * B[t_1] - c[t_1]))
end
optimize!(m)
objective_value(m)
c_opt, B_opt = value.(c), value.(B)
scatter(time_B,   B_opt.data, lab = "B: wealth balance");
scatter!(time_c,  c_opt.data, lab = "c: consumption")

odow · 2021-08-17T19:43:34Z

The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

But in this example we are not credit constrained.

To clarify my question in the other issue: in this solution, does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

I'm guessing by the solution and that the answer is "we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

SDDP.jl does better than this by taking into account that the agent at time t cannot know the future of times t+1 and so on.

azev77 · 2021-08-17T20:24:37Z

Can you resolve the problem w/o B>=0?

I think this problem actually has a closed form solution. I will take a look when I get home

odow · 2021-08-17T20:28:31Z

Can you resolve the problem w/o B>=0?

It will be the same answer because the wealth constraint is never binding.

I think this problem actually has a closed form solution

This makes me much more sure that

"we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

is true. I think we're talking about different problems here with a different set of assumptions.

azev77 · 2021-08-17T20:30:49Z

Can you re-write using Ipopt.jl so I can solve on my computer?

odow · 2021-08-17T20:32:06Z

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

odow · 2021-08-17T20:32:51Z

I don't think we need to solve problems though, we need to answer the conceptual question:

does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

azev77 · 2021-08-17T20:40:19Z

At t=0,
observe B0, y0
Choose c0

next y1 is realized
At t=1,
observe B1, y1
Choose c1

Next y2 is realized

you choose ct after observing yt, before y_t+1

odow · 2021-08-17T20:57:21Z

you choose ct after observing yt, before y_t+1

Okay, we on the same page then. But that's not what your discrete-time Ipopt code above does?

azev77 · 2021-08-17T21:03:28Z

My discrete time Ipopt code is non-stochastic, y_t=0, for all time periods, so it’s perfect foresight

odow · 2021-08-17T21:29:39Z

My discrete time Ipopt code is non-stochastic

Yeah haha I'm getting confused with the code across the two issues. I meant the code that produced the figures with the straight lines.

azev77 · 2021-08-18T20:09:58Z

Below is my matrix the the 4 types of intertemporal optimization problems we solve in Econ.
I'm gonna write a blog post about this at some point.

The only sol so far I don't understand is @pulsipher's stochastic continuous-time.
It sounds like you think it solves a different model.

Take a look here: JuliaPOMDP/POMDPs.jl#351
Can SDDP.jl solve an infinite-horizon version of the above problem?
(Btw, where is the Bellman equation? where are the the iteration based solvers VFI/PFI etc?)

odow · 2021-08-18T20:45:47Z

SDDP.jl can solve the discrete-time deterministic and stochastic cases, for both the finite horizon and the infinite horizon cases.

For the infinite case, you just need to replace the LinearPolicyGraph with (this will make more sense once you read the theory below):

    graph = SDDP.LinearGraph(1)
    SDDP.add_edge(graph, 1 => 1, 1 - ρ)
    model = SDDP.PolicyGraph(
        graph,
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t

Here's a tutorial on the theory, which outlines the bellman recursion etc: https://odow.github.io/SDDP.jl/stable/tutorial/theory/21_theory_intro/

It's not clear to me that your notation for the stochastic cases is correct.

Shouldn't c(t) be indexed by the history of the stochastic process up to time t?
Can c(t) see y_t or only y_{t-1}?
The imprecision in this notation is one reason why I will strongly advocate for the policy graph.

It's another one of those "MDP solvers are all alike in different ways". It uses linear programming duality to construct the value function. You could say that it is value function approximation ala Approximate Dynamic Programming. You could also say that it is Q-learning in the POMDP world, but we converge to a provably optimal solution and it takes into account more than one-step into the future.

There are plotting tools to visualize the resulting value function: https://odow.github.io/SDDP.jl/stable/tutorial/basic/05_plotting/#Plotting-the-value-function

The best thing about SDDP is that the subproblems are JuMP models, so you're free to add as many variables and constraints as you like. We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

azev77 · 2021-08-18T20:52:59Z

Shouldn't c(t) be indexed by the history of the stochastic process up to time t?

Yes. Me & economists are often a bit sloppy & omit these things to make the notation more digestible.
Here is an example from my research (where s^t is the history of the shock...):

azev77 · 2021-08-18T20:58:30Z

We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

If what you're saying is true (& works for problems w/ non-linear reward functions like log) then packages like yours can be a game-changer for economists. Economists typically avoid problems w/ more than 5-state variables.

Check out VFIToolkit.m.

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them.
I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

odow · 2021-08-18T21:11:47Z

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them. I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

This is what @bernardokp has been telling me for some time... #428

There are some issues with nonlinear functions: everything must be convex. So you could have log rewards (if maximizing) but not log costs (if maximizing).

There are some crappy finance/inventory problems, but not enough:
https://github.com/odow/SDDP.jl/blob/master/examples/inventory_management.jl
https://github.com/odow/SDDP.jl/blob/master/docs/src/examples/asset_management_stagewise.jl

@lorenzoreus has some finance examples.

Happy to accept PRs with examples and/or documentation. But I don't have much interest (or domain expertise) to write them myself.

odow · 2021-08-18T21:15:36Z

Here's a model that @jd-lara had been running

We can also find risk-averse policies, so swap out that E operator for your choice of convex risk measure.
https://odow.github.io/SDDP.jl/stable/guides/add_a_risk_measure/
https://odow.github.io/SDDP.jl/stable/tutorial/theory/22_risk/

We also have the ability to solve problems where the randomness is a hidden Markov model (e.g., bear and bull markets with random returns, where the returns are observed but the bear/bull state is hidden).

I should tidy the examples/tutorials to clarify all of this.

azev77 · 2021-08-18T21:23:45Z

Is it possible to show me how to solve the example you did above (discrete time version of my model) w/ an open-source solver?
Except an infinite horizon version (so I can compare w/ closed form solutions more easily)

PS: wouldn't you agree the world would be a much better place if more people from different fields cooperated & submitted bug-reports & PRs to the same packages? (I sure do...)

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

odow · 2021-08-18T21:37:53Z

Is it possible to show me how to solve the example you did above

#429 (comment)

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

Even though this problem is small, it still requires on the order of 10^4 QP solves. So having a good solver is critical. Get Gurobi. They have free licenses for academics.

wouldn't you agree the world would be a much better place if more people from different fields

Yes.

I'll say this publicly: I don't believe the stochastic programming community has done a good job of publicizing the technology we have.

There were three main issues:

a massive theoretical barrier to entry with overly complicated notation and far too involved math
no one open-sourced an implementation until recently
(almost) everyone works on a single problem: hydro thermal scheduling

Here's some good primers on the industrial problem:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8442754&casa_token=tGCBXzSNn_AAAAAA:LiZBvAxDyPAr0F3NPpycCWqJioDKTNZ8-3g20w77OlhF0F8s5sC79Dhdrr47D0QHjEN6zfUXMw&tag=1
https://www.psr-inc.com/softwares-en/
https://blog.sintef.com/sintefenergy/hydropower/learning-from-hydrothermal-scheduling-models-in-brazil-and-norway/

azev77 · 2021-08-18T22:52:30Z

Got it to work w/ Ipopt.
It runs, but no good...

odow · 2021-08-18T22:56:27Z

Yeah it's probably very slow and might crash/warn about numerical instability?

azev77 · 2021-08-19T20:54:51Z

It stops at the iteration_limit = 100 but is not optimal by then & I don't wanna wait forever.

I just worked out the closed form solution for the problem above (infinite horizon version w/ AR(1) income).
This would be an awesome example bc it's so simple & we can compare w/ closed form.
PS: I just realized the finite-horizon version of this problem is even on wikipedia...

If I had Gurobi, I'd do it on my own.
@odow Would it be possible for you to solve this infinite-horizon problem w/ SDDP.jl & compare w/ the closed form solution (like I did here) w/ @pulsipher's help?

Also, can SDDP.jl solve for the policy function so we can compare it w/ the closed form?

@pulsipher I messed up in explaining the stochastic problem to you before. c[t] is lazy/sloppy notation for optimal consumption, which is in fact a function of the entire history of the state variable up to time t...

PS: It would be interesting to see how your package solves this model where income follows an AR(1) process.
Economists usually approx the AR(1) w/ a Markov chain & then do VFI or PFI...

PS: I'd be happy to submit a PR w/ an example (like I did w/ InfiniteOpt) if you provide the code/output
In the example I like to plot:

the numerical value function vs closed form (see my discussion in POMDP)
the numerical policy function vs closed form
a simulation of consumption/income/wealth numerical vs closed form

odow · 2021-08-19T21:49:25Z

This notation looks better. Nice to see that c_t isn't linear, and is a function of the history of the random walk.

I'll see if I have time later to implement it. It would make a very nice tutorial.

azev77 · 2021-08-20T17:27:07Z

There are a few typos in my notes above, please hold-off until I fix them...

azev77 · 2021-08-20T19:46:37Z

Here are the corrected solutions (recursive & time-series):

I coded them here:

using Distributions, Plots

# Define parameters 
β=0.98; # discount factor
R=1/β;  # gross interest rate
r=R-1;  # net interest rate
b = 0.01  # utility parameter. c ∈ (0,1/b)   
dt = 1    # infinite horizon. time step. 
B0 = 100.0 # endowment
u(c; b=b) = c - (b/2)*c^2       # utility function
discount(t;β=β) = β^t # discount function

λ=0.9; # income persistence
ȳ=0.0; # long-run mean income 
σ_ε = 0.5; # income shock vol. 
y0 = 0.0; # starting income 
θ = 1/(1+r-λ)
dist_ε = Normal(0.0,σ_ε)

μ_B(B,y,c; r=r) = B +r*B +y -c 
μ_y(y,ε; λ=λ,ȳ=ȳ) = (1-λ)*ȳ +λ*y +ε

# Closed Form 
c_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = r*B +r*θ*y +(1-λ)*θ*ȳ
B_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = B +(1-λ)*θ*y - (1-λ)*θ*ȳ
val_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ,σ_ε=σ_ε) = ((1+r)/r) * (c_pol_cf(B,y)) +
    -(b/2)*((1+r)/r) * (c_pol_cf(B,y))^2 +
    -(b/2)*(1+r)*((θ*σ_ε)^2); 
#
c0 = c_pol_cf(B0,y0)

# Simulation 
T_sim = 200
ε_sim = rand(dist_ε, T_sim)
#
y_sim, B_sim, c_sim = [], [], []
push!(y_sim, y0)
push!(B_sim, B0)

# Recursive solutions
for tt in 1:T_sim
    y_new = μ_y(y_sim[tt],ε_sim[tt])        # y_t+1 =f(yt,ε_t+1)
    c_new = c_pol_cf(B_sim[tt], y_sim[tt])  # c_t =f(Bt,yt)
    B_new = B_pol_cf(B_sim[tt], y_sim[tt])  # B_t+1 =f(Bt,yt)
    #B_new = μ_B(B_sim[tt], y_sim[tt], c_new) # same as above.
    #
    push!(y_sim, y_new)
    push!(c_sim, c_new) 
    push!(B_sim, B_new) 
end

# Time-series solutions 
y_sim_TS, B_sim_TS, c_sim_TS = [], [], []
push!(y_sim_TS, y0)
push!(B_sim_TS, B0)
push!(c_sim_TS, c0)
for tt in 1: (T_sim-1)
    push!(y_sim_TS, (λ^tt)*y0 + (1-λ^tt)*ȳ +sum(i->λ^(tt-i) * ε_sim[i], 1:tt))
    push!(c_sim_TS, c0 + r*θ*sum(ε_sim[1:tt]))
    push!(B_sim_TS, B0 +(1-λ)*θ*sum(y_sim[1:tt]) -tt*(1-λ)*θ*ȳ)
end 

# compare Recursive vs TS 
y_sim_TS ≈ y_sim[1:end-1]
c_sim_TS ≈ c_sim
B_sim_TS ≈ B_sim[1:end-1]

plot(); 
plot!(y_sim_TS, lab=""); plot!(y_sim, lab="y");
plot!(c_sim_TS, lab=""); plot!(c_sim, lab="c");
plot!(B_sim_TS, lab=""); plot!(B_sim, lab="B")

# stats 
[std(x) for x in [y_sim, y_sim + r*B_sim, B_sim, c_sim] ]

#
plot(0:150,u, lab="return function");
plot!([1/b], seriestype = :vline, lab="bliss point")

plot();
plot!(0:200, B -> c_pol_cf(B,-2), lab="c Policy, y=-2") ;
plot!(0:200, B -> c_pol_cf(B,0), lab="c Policy, y=0") ;
plot!(0:200, B -> c_pol_cf(B,2), lab="c Policy, y=2") 

plot();
plot!(0:200, B -> B_pol_cf(B,-2), lab="B Policy, y=-2") ;
plot!(0:200, B -> B_pol_cf(B,0), lab="B Policy, y=0") ;
plot!(0:200, B -> B_pol_cf(B,2), lab="B Policy, y=2") 

plot();
plot!(0:2000, B -> val_cf(B,-2), lab="Value, y=-2") ;
plot!(0:2000, B -> val_cf(B,0), lab="Value, y=0") ;
plot!(0:2000, B -> val_cf(B,2), lab="Value, y=2")

Simulation:

Policy function (consumption):

Value function (quadratic):

PS: it's easier to see that consumption is fluctuating if you zoom in

(consumption is just less volatile, "smoother", when ppl can borrow/save, as econ-theory predicts)

odow · 2021-08-21T23:55:39Z

This needs some local changes to SDDP.jl (to add the plotting callback), so you won't be able to run the code. But, the blue line is us, the orange dashed line is the closed-form solution. For now I just used β = 0.9 to make it easier, but it takes surprisingly many iterations before we start to reliably reproduce the optimal policy. Part of the problem is that y is unbounded over the infinite horizon. It'd be a lot easier if the horizon was finite or y was bounded.

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    β = 0.9,
    b = 0.01,
    B0 = 100.0,
    λ = 0.9,
    ȳ = 0.0,
    σ_ε = 0.5,
    y0 = 0.0,
    SAMPLE_SPACE_SIZE = 100,
)
    r = 1 / β - 1
    θ = 1 / (1 + r - λ)
    Ω = rand(Distributions.Normal(0, σ_ε), SAMPLE_SPACE_SIZE)
    model = SDDP.PolicyGraph(
        SDDP.Graph(0, [1], [(0 => 1, 1.0), (1 => 1, β)]),
        sense = :Max,
        upper_bound = 1000,
        optimizer = Gurobi.Optimizer,
    ) do sp, t
        set_silent(sp)
        @variables(sp, begin
            B >= 0, (SDDP.State, initial_value = B0)
            y, (SDDP.State, initial_value = y0)
            0 <= c <= 1 / b, (SDDP.State, initial_value = 0)
            ε
            δ >= 0
        end)
        @constraints(sp, begin
            B.out == (1 + r) * B.in + y.in - c.out + δ
            y.out == (1 - λ) * ȳ + λ * y.in + ε
        end)
        SDDP.parameterize(ω -> fix(ε, ω), sp, Ω)
        @stageobjective(sp, c.out - (b / 2) * c.out^2 - 10_000 * δ)
    end
    function closed_form_solution(ε)
        T = length(ε)
        y = zeros(T)
        c = zeros(T)
        B = zeros(T)
        c0 = r * B0 + r * θ * y0 + (1 - λ) * θ * ȳ
        for t = 1:T
            y[t] = (1 - λ^t) * ȳ + λ^t * y0 + sum(λ^(t - j) * ε[j] for j=1:t)
            c[t] = c0 + r * θ * sum(ε[1:t])
            B[t] = B0 + (1 - λ) * θ * (y0 + sum(y[1:t-1])) - t * (1 - λ) * θ * ȳ
        end
        return B, y, c
    end
    iteration = 1
    function forward_pass_callback(trajectory)
        ε = map(x -> x[2], trajectory.scenario_path)
        Tmax = length(ε)
        cf_B, cf_y, cf_c = closed_form_solution(ε)
        plot_B = Plots.plot(
            ylabel = "B",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 200),
        )
        plot_y = Plots.plot(
            title = "Iteration $(iteration)",
            ylabel = "y",
            legend = false,
            xlim = (0, 20),
            ylim = (-3, 3),
        )
        plot_c = Plots.plot(
            ylabel = "c",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 100),
        )
        Plots.plot!(plot_B, 1:Tmax, map(x -> x[:B], trajectory.sampled_states))
        Plots.plot!(plot_B, 1:Tmax, cf_B, style = :dash)
        Plots.plot!(plot_y, 1:Tmax, map(x -> x[:y], trajectory.sampled_states))
        Plots.plot!(plot_y, 1:Tmax, cf_y, style = :dash)
        Plots.plot!(plot_c, 1:Tmax, map(x -> x[:c], trajectory.sampled_states))
        Plots.plot!(plot_c, 1:Tmax, cf_c, style = :dash)
        plt = Plots.plot(
            plot_B,
            plot_y,
            plot_c,
            layout = (1, 3),
        )
        Plots.savefig(plt, "iterations/iteration_$(1000 + iteration).png")
        iteration += 1
        return
    end
    SDDP.train(
        model;
        iteration_limit = 500,
        forward_pass_callback = forward_pass_callback,
    )
    run(`convert iterations/\*.png training.gif`)
    return
end

main()

azev77 · 2021-08-22T00:07:11Z

I’ll take a closer look when I come home.
Are you able to plot the policy & value functions?
Like I did: As a function of B, for say 3 values of y.
I don’t need to see each iteration, just the solution

azev77 · 2021-12-27T01:36:09Z

Hey @odow I'm trying to compare intertemporal optimization solvers in Julia here.

I coded up the (correct) finite horizon solution w/ JuMP-Ipopt:

#Finite Horizon Discrete Time w/ jump
if 1==1
    using JuMP, Ipopt, Plots    # InfiniteOpt, 
    δ=0.10; ρ=0.05; 

    # Need: z > ρ + δ
    z= ρ + δ +.10    # Case: k_T+1 =0 liquidate capital 
    #z= ρ + δ +.50    # Case: k_T+1 >0 burn some capital 

    I_SS = (z-ρ-δ)/(ρ+δ)
    K_SS = I_SS/δ

    T = 90.0   # life horizon
    k0 = 1.0 # endowment

    time_i = 0.0:1.0:T
    time_k = 0.0:1.0:T+1

    u(k,i;z=z) = z*k -i -(1/2)*(i)^2  # utility function
    d(t; ρ=ρ) = (1+ρ)^(-t)     # discount function

    m = Model(Ipopt.Optimizer)

    register(m, :u, 2, u; autodiff = true)
    register(m, :d, 1, d; autodiff = true)

    @variable(m, i[time_i]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
    @variable(m, k[time_k])

    # LOM 
    for j in 2:length(time_k)
        t_1, t = time_k[j - 1], time_k[j]
        @constraint(m, k[t] == i[t_1] + (1-δ)*k[t_1])
    end

    # Can't sell more than all your capital.
    # Equiv to K_t >=0
    for j in 1:length(time_i)
        t = time_i[j]
        @constraint(m, i[t] >= -(1-δ)*k[t])
    end

    fix(k[0.0], k0)
    # @constraint(m, i[T] == max(-1, -(1-δ)*k[T]) )
    @constraint(m, i[T] >= -1 )

    @NLobjective(m, Max, sum(d(t)*u(k[t],i[t]) for t in time_i) )

    optimize!(m)
    objective_value(m)
    i_opt, k_opt = value.(i), value.(k)

    scatter(time_k,   k_opt.data, lab = "k");
    scatter!(time_i,  i_opt.data, lab = "i")
    plot!([0.0],  seriestype = :hline, lab="", color="red")
    plot!([-(1-δ)*k_opt.data[end-1] -1],  seriestype = :hline, lab="I_T TC", color="grey", l=:dash)
    plot!([K_SS I_SS],  seriestype = :hline, lab="", color="grey", l=:dash)
    plot!([T+1],  seriestype = :vline, lab="", color="grey", l=:dash)
end

How can I solve the same finite horizon problem w/ SDDP.jl - Ipopt?
How can I solve the infinite horizon version w/ SDDP.jl - Ipopt?
PS: the infinite horizon terminal condition is: limit e^(-\rho*t) * k_t * (1+i_t) =0

odow · 2022-12-14T00:20:38Z

Closing for the same reason as #528 (comment).

SDDP.jl isn't really intended as a solution method for these types of problems.

odow added the documentation label Jun 17, 2021

odow added the help-wanted label Aug 15, 2021

odow mentioned this issue Aug 30, 2021

Add forward_pass_callback #466

Merged

odow mentioned this issue Dec 14, 2022

Documentation improvements #496

Closed

10 tasks

odow closed this as completed Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add examples of Stochastic Optimal Control #429

Add examples of Stochastic Optimal Control #429

odow commented Jun 17, 2021

odow commented Aug 15, 2021

odow commented Aug 15, 2021

pulsipher commented Aug 16, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 18, 2021

odow commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 18, 2021 •

edited

Loading

odow commented Aug 18, 2021

odow commented Aug 18, 2021

azev77 commented Aug 18, 2021 •

edited

Loading

odow commented Aug 18, 2021

azev77 commented Aug 18, 2021

odow commented Aug 18, 2021

azev77 commented Aug 19, 2021 •

edited

Loading

odow commented Aug 19, 2021 •

edited

Loading

azev77 commented Aug 20, 2021

azev77 commented Aug 20, 2021 •

edited

Loading

odow commented Aug 21, 2021

azev77 commented Aug 22, 2021

azev77 commented Dec 27, 2021

odow commented Dec 14, 2022

Add examples of Stochastic Optimal Control #429

Add examples of Stochastic Optimal Control #429

Comments

odow commented Jun 17, 2021

odow commented Aug 15, 2021

odow commented Aug 15, 2021

pulsipher commented Aug 16, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 17, 2021

odow commented Aug 17, 2021

azev77 commented Aug 18, 2021

odow commented Aug 18, 2021 • edited Loading

azev77 commented Aug 18, 2021 • edited Loading

azev77 commented Aug 18, 2021 • edited Loading

odow commented Aug 18, 2021

odow commented Aug 18, 2021

azev77 commented Aug 18, 2021 • edited Loading

odow commented Aug 18, 2021

azev77 commented Aug 18, 2021

odow commented Aug 18, 2021

azev77 commented Aug 19, 2021 • edited Loading

odow commented Aug 19, 2021 • edited Loading

azev77 commented Aug 20, 2021

azev77 commented Aug 20, 2021 • edited Loading

odow commented Aug 21, 2021

azev77 commented Aug 22, 2021

azev77 commented Dec 27, 2021

odow commented Dec 14, 2022

odow commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 18, 2021 •

edited

Loading

azev77 commented Aug 19, 2021 •

edited

Loading

odow commented Aug 19, 2021 •

edited

Loading

azev77 commented Aug 20, 2021 •

edited

Loading