Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of Stochastic Optimal Control #429

Closed
odow opened this issue Jun 17, 2021 · 33 comments
Closed

Add examples of Stochastic Optimal Control #429

odow opened this issue Jun 17, 2021 · 33 comments

Comments

@odow
Copy link
Owner

odow commented Jun 17, 2021

For the INFORMS talk

@odow
Copy link
Owner Author

odow commented Aug 15, 2021

Ideally, one of @pulsipher's examples from InfiniteOpt: https://github.com/pulsipher/InfiniteOpt.jl/tree/master/docs/src/examples

@odow
Copy link
Owner Author

odow commented Aug 15, 2021

If anyone reading this has nice examples of an optimal control type problem they want to share, please reach out.

@pulsipher
Copy link

The stochastic optimal economic control problem I recently setup with @azev77 in infiniteopt/InfiniteOpt.jl#131 (comment) (look at the conversation starting from this comment) might be a good fit. Naturally, the difference here is that you would want to implement its discrete version.

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

Thanks for the pointer!

Here's the SDDP.jl equivalent

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

image

@azev77
Copy link

azev77 commented Aug 17, 2021

@odow thanks!

  1. I currently don't have Gurobi so can't replicate
  2. In my discrete time analog to the continuous time model: c[0], c[1], ..., c[T] & B[0], B[1], ..., B[T], B[T+1]
  3. I don't have B >= 0 (as you do)
  4. The basic theory is that households try to smooth consumption by borrowing B[t]<0 & saving B[t]>0. Hence we expect consumption to be smoother than income/wealth (in an unconstrained world), but still stochastic.
  5. The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

Here is my discrete time (non-stochastic, y_t=0) code:

using JuMP, Ipopt, Plots    # InfiniteOpt, 
ρ  = 0.020  # discount rate
k  = 100.0  # utility bliss point
T  = 10.0   # life horizon
r  = 0.05   # interest rate
B0 = 100.0  # endowment

time_c = 0.0:1.0:T
time_B = 0.0:1.0:T+1

u(c; k=k) = -(c - k)^2.0  # utility function
d(t; ρ=ρ) = exp(-ρ*t)     # discount function

m = Model(Ipopt.Optimizer)
register(m, :u, 1, u; autodiff = true)
register(m, :d, 1, d; autodiff = true)
@variable(m, c[time_c]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
@variable(m, B[time_B])
fix(B[0.0], B0)
fix(B[T+1], 0.0)
@NLobjective(m, Max, sum(d(t)*u(c[t]) for t in time_c) )
#@NLobjective(m, Max, sum(exp(-ρ*t)*(-1.0)*(c[t]-k)^2.0 for t in 0.0:1.0:10.0) )

for i in 2:length(time_B)
    t_1, t = time_B[i - 1], time_B[i]
    @constraint(m, B[t] == B[t_1] + 1.0 * (r * B[t_1] - c[t_1]))
end
optimize!(m)
objective_value(m)
c_opt, B_opt = value.(c), value.(B)
scatter(time_B,   B_opt.data, lab = "B: wealth balance");
scatter!(time_c,  c_opt.data, lab = "c: consumption")

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

But in this example we are not credit constrained.

To clarify my question in the other issue: in this solution, does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

image

I'm guessing by the solution and that the answer is "we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

SDDP.jl does better than this by taking into account that the agent at time t cannot know the future of times t+1 and so on.

@azev77
Copy link

azev77 commented Aug 17, 2021

Can you resolve the problem w/o B>=0?

I think this problem actually has a closed form solution. I will take a look when I get home

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

Can you resolve the problem w/o B>=0?

It will be the same answer because the wealth constraint is never binding.

I think this problem actually has a closed form solution

This makes me much more sure that

"we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

is true. I think we're talking about different problems here with a different set of assumptions.

@azev77
Copy link

azev77 commented Aug 17, 2021

Can you re-write using Ipopt.jl so I can solve on my computer?

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

I don't think we need to solve problems though, we need to answer the conceptual question:

does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

@azev77
Copy link

azev77 commented Aug 17, 2021

At t=0,
observe B0, y0
Choose c0

next y1 is realized
At t=1,
observe B1, y1
Choose c1

Next y2 is realized

you choose ct after observing yt, before y_t+1

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

you choose ct after observing yt, before y_t+1

Okay, we on the same page then. But that's not what your discrete-time Ipopt code above does?

@azev77
Copy link

azev77 commented Aug 17, 2021

My discrete time Ipopt code is non-stochastic, y_t=0, for all time periods, so it’s perfect foresight

@odow
Copy link
Owner Author

odow commented Aug 17, 2021

My discrete time Ipopt code is non-stochastic

Yeah haha I'm getting confused with the code across the two issues. I meant the code that produced the figures with the straight lines.

@azev77
Copy link

azev77 commented Aug 18, 2021

Below is my matrix the the 4 types of intertemporal optimization problems we solve in Econ.
I'm gonna write a blog post about this at some point.

The only sol so far I don't understand is @pulsipher's stochastic continuous-time.
It sounds like you think it solves a different model.

Take a look here: JuliaPOMDP/POMDPs.jl#351
Can SDDP.jl solve an infinite-horizon version of the above problem?
(Btw, where is the Bellman equation? where are the the iteration based solvers VFI/PFI etc?)

image

@odow
Copy link
Owner Author

odow commented Aug 18, 2021

SDDP.jl can solve the discrete-time deterministic and stochastic cases, for both the finite horizon and the infinite horizon cases.

For the infinite case, you just need to replace the LinearPolicyGraph with (this will make more sense once you read the theory below):

    graph = SDDP.LinearGraph(1)
    SDDP.add_edge(graph, 1 => 1, 1 - ρ)
    model = SDDP.PolicyGraph(
        graph,
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t

Here's a tutorial on the theory, which outlines the bellman recursion etc: https://odow.github.io/SDDP.jl/stable/tutorial/theory/21_theory_intro/

It's not clear to me that your notation for the stochastic cases is correct.

  • Shouldn't c(t) be indexed by the history of the stochastic process up to time t?
  • Can c(t) see y_t or only y_{t-1}?
    The imprecision in this notation is one reason why I will strongly advocate for the policy graph.

It's another one of those "MDP solvers are all alike in different ways". It uses linear programming duality to construct the value function. You could say that it is value function approximation ala Approximate Dynamic Programming. You could also say that it is Q-learning in the POMDP world, but we converge to a provably optimal solution and it takes into account more than one-step into the future.

There are plotting tools to visualize the resulting value function: https://odow.github.io/SDDP.jl/stable/tutorial/basic/05_plotting/#Plotting-the-value-function
image

The best thing about SDDP is that the subproblems are JuMP models, so you're free to add as many variables and constraints as you like. We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

@azev77
Copy link

azev77 commented Aug 18, 2021

Shouldn't c(t) be indexed by the history of the stochastic process up to time t?

Yes. Me & economists are often a bit sloppy & omit these things to make the notation more digestible.
Here is an example from my research (where s^t is the history of the shock...):
image

@azev77
Copy link

azev77 commented Aug 18, 2021

We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

If what you're saying is true (& works for problems w/ non-linear reward functions like log) then packages like yours can be a game-changer for economists. Economists typically avoid problems w/ more than 5-state variables.

Check out VFIToolkit.m.

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them.
I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

@odow
Copy link
Owner Author

odow commented Aug 18, 2021

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them. I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

This is what @bernardokp has been telling me for some time... #428

There are some issues with nonlinear functions: everything must be convex. So you could have log rewards (if maximizing) but not log costs (if maximizing).

There are some crappy finance/inventory problems, but not enough:
https://github.com/odow/SDDP.jl/blob/master/examples/inventory_management.jl
https://github.com/odow/SDDP.jl/blob/master/docs/src/examples/asset_management_stagewise.jl

@lorenzoreus has some finance examples.

Happy to accept PRs with examples and/or documentation. But I don't have much interest (or domain expertise) to write them myself.

@odow
Copy link
Owner Author

odow commented Aug 18, 2021

Here's a model that @jd-lara had been running
image

We can also find risk-averse policies, so swap out that E operator for your choice of convex risk measure.
https://odow.github.io/SDDP.jl/stable/guides/add_a_risk_measure/
https://odow.github.io/SDDP.jl/stable/tutorial/theory/22_risk/

We also have the ability to solve problems where the randomness is a hidden Markov model (e.g., bear and bull markets with random returns, where the returns are observed but the bear/bull state is hidden).

I should tidy the examples/tutorials to clarify all of this.

@azev77
Copy link

azev77 commented Aug 18, 2021

Is it possible to show me how to solve the example you did above (discrete time version of my model) w/ an open-source solver?
Except an infinite horizon version (so I can compare w/ closed form solutions more easily)

PS: wouldn't you agree the world would be a much better place if more people from different fields cooperated & submitted bug-reports & PRs to the same packages? (I sure do...)

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

@odow
Copy link
Owner Author

odow commented Aug 18, 2021

Is it possible to show me how to solve the example you did above

#429 (comment)

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

Even though this problem is small, it still requires on the order of 10^4 QP solves. So having a good solver is critical. Get Gurobi. They have free licenses for academics.

wouldn't you agree the world would be a much better place if more people from different fields

Yes.

I'll say this publicly: I don't believe the stochastic programming community has done a good job of publicizing the technology we have.

There were three main issues:

  • a massive theoretical barrier to entry with overly complicated notation and far too involved math
  • no one open-sourced an implementation until recently
  • (almost) everyone works on a single problem: hydro thermal scheduling

Here's some good primers on the industrial problem:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8442754&casa_token=tGCBXzSNn_AAAAAA:LiZBvAxDyPAr0F3NPpycCWqJioDKTNZ8-3g20w77OlhF0F8s5sC79Dhdrr47D0QHjEN6zfUXMw&tag=1
https://www.psr-inc.com/softwares-en/
https://blog.sintef.com/sintefenergy/hydropower/learning-from-hydrothermal-scheduling-models-in-brazil-and-norway/

@azev77
Copy link

azev77 commented Aug 18, 2021

Got it to work w/ Ipopt.
It runs, but no good...

@odow
Copy link
Owner Author

odow commented Aug 18, 2021

Yeah it's probably very slow and might crash/warn about numerical instability?

@azev77
Copy link

azev77 commented Aug 19, 2021

It stops at the iteration_limit = 100 but is not optimal by then & I don't wanna wait forever.

I just worked out the closed form solution for the problem above (infinite horizon version w/ AR(1) income).
This would be an awesome example bc it's so simple & we can compare w/ closed form.
PS: I just realized the finite-horizon version of this problem is even on wikipedia...

image

If I had Gurobi, I'd do it on my own.
@odow Would it be possible for you to solve this infinite-horizon problem w/ SDDP.jl & compare w/ the closed form solution (like I did here) w/ @pulsipher's help?

Also, can SDDP.jl solve for the policy function so we can compare it w/ the closed form?

@pulsipher I messed up in explaining the stochastic problem to you before. c[t] is lazy/sloppy notation for optimal consumption, which is in fact a function of the entire history of the state variable up to time t...

PS: It would be interesting to see how your package solves this model where income follows an AR(1) process.
Economists usually approx the AR(1) w/ a Markov chain & then do VFI or PFI...

PS: I'd be happy to submit a PR w/ an example (like I did w/ InfiniteOpt) if you provide the code/output
In the example I like to plot:

  • the numerical value function vs closed form (see my discussion in POMDP)
  • the numerical policy function vs closed form
  • a simulation of consumption/income/wealth numerical vs closed form

@odow
Copy link
Owner Author

odow commented Aug 19, 2021

This notation looks better. Nice to see that c_t isn't linear, and is a function of the history of the random walk.

I'll see if I have time later to implement it. It would make a very nice tutorial.

@azev77
Copy link

azev77 commented Aug 20, 2021

There are a few typos in my notes above, please hold-off until I fix them...

@azev77
Copy link

azev77 commented Aug 20, 2021

Here are the corrected solutions (recursive & time-series):
image

I coded them here:

using Distributions, Plots

# Define parameters 
β=0.98; # discount factor
R=1/β;  # gross interest rate
r=R-1;  # net interest rate
b = 0.01  # utility parameter. c ∈ (0,1/b)   
dt = 1    # infinite horizon. time step. 
B0 = 100.0 # endowment
u(c; b=b) = c - (b/2)*c^2       # utility function
discount(t;β=β) = β^t # discount function

λ=0.9; # income persistence=0.0; # long-run mean income 
σ_ε = 0.5; # income shock vol. 
y0 = 0.0; # starting income 
θ = 1/(1+r-λ)
dist_ε = Normal(0.0,σ_ε)

μ_B(B,y,c; r=r) = B +r*B +y -c 
μ_y(y,ε; λ=λ,ȳ=ȳ) = (1-λ)*+λ*y +ε

# Closed Form 
c_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = r*B +r*θ*y +(1-λ)*θ*B_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = B +(1-λ)*θ*y - (1-λ)*θ*val_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ,σ_ε=σ_ε) = ((1+r)/r) * (c_pol_cf(B,y)) +
    -(b/2)*((1+r)/r) * (c_pol_cf(B,y))^2 +
    -(b/2)*(1+r)*((θ*σ_ε)^2); 
#
c0 = c_pol_cf(B0,y0)

# Simulation 
T_sim = 200
ε_sim = rand(dist_ε, T_sim)
#
y_sim, B_sim, c_sim = [], [], []
push!(y_sim, y0)
push!(B_sim, B0)

# Recursive solutions
for tt in 1:T_sim
    y_new = μ_y(y_sim[tt],ε_sim[tt])        # y_t+1 =f(yt,ε_t+1)
    c_new = c_pol_cf(B_sim[tt], y_sim[tt])  # c_t =f(Bt,yt)
    B_new = B_pol_cf(B_sim[tt], y_sim[tt])  # B_t+1 =f(Bt,yt)
    #B_new = μ_B(B_sim[tt], y_sim[tt], c_new) # same as above.
    #
    push!(y_sim, y_new)
    push!(c_sim, c_new) 
    push!(B_sim, B_new) 
end

# Time-series solutions 
y_sim_TS, B_sim_TS, c_sim_TS = [], [], []
push!(y_sim_TS, y0)
push!(B_sim_TS, B0)
push!(c_sim_TS, c0)
for tt in 1: (T_sim-1)
    push!(y_sim_TS, (λ^tt)*y0 + (1-λ^tt)*+sum(i->λ^(tt-i) * ε_sim[i], 1:tt))
    push!(c_sim_TS, c0 + r*θ*sum(ε_sim[1:tt]))
    push!(B_sim_TS, B0 +(1-λ)*θ*sum(y_sim[1:tt]) -tt*(1-λ)*θ*ȳ)
end 

# compare Recursive vs TS 
y_sim_TS  y_sim[1:end-1]
c_sim_TS  c_sim
B_sim_TS  B_sim[1:end-1]

plot(); 
plot!(y_sim_TS, lab=""); plot!(y_sim, lab="y");
plot!(c_sim_TS, lab=""); plot!(c_sim, lab="c");
plot!(B_sim_TS, lab=""); plot!(B_sim, lab="B")

# stats 
[std(x) for x in [y_sim, y_sim + r*B_sim, B_sim, c_sim] ]

#
plot(0:150,u, lab="return function");
plot!([1/b], seriestype = :vline, lab="bliss point")

plot();
plot!(0:200, B -> c_pol_cf(B,-2), lab="c Policy, y=-2") ;
plot!(0:200, B -> c_pol_cf(B,0), lab="c Policy, y=0") ;
plot!(0:200, B -> c_pol_cf(B,2), lab="c Policy, y=2") 

plot();
plot!(0:200, B -> B_pol_cf(B,-2), lab="B Policy, y=-2") ;
plot!(0:200, B -> B_pol_cf(B,0), lab="B Policy, y=0") ;
plot!(0:200, B -> B_pol_cf(B,2), lab="B Policy, y=2") 

plot();
plot!(0:2000, B -> val_cf(B,-2), lab="Value, y=-2") ;
plot!(0:2000, B -> val_cf(B,0), lab="Value, y=0") ;
plot!(0:2000, B -> val_cf(B,2), lab="Value, y=2") 

Simulation:
image
Policy function (consumption):
image
Value function (quadratic):
image

PS: it's easier to see that consumption is fluctuating if you zoom in
image
(consumption is just less volatile, "smoother", when ppl can borrow/save, as econ-theory predicts)

@odow
Copy link
Owner Author

odow commented Aug 21, 2021

This needs some local changes to SDDP.jl (to add the plotting callback), so you won't be able to run the code. But, the blue line is us, the orange dashed line is the closed-form solution. For now I just used β = 0.9 to make it easier, but it takes surprisingly many iterations before we start to reliably reproduce the optimal policy. Part of the problem is that y is unbounded over the infinite horizon. It'd be a lot easier if the horizon was finite or y was bounded.
training

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    β = 0.9,
    b = 0.01,
    B0 = 100.0,
    λ = 0.9,
    ȳ = 0.0,
    σ_ε = 0.5,
    y0 = 0.0,
    SAMPLE_SPACE_SIZE = 100,
)
    r = 1 / β - 1
    θ = 1 / (1 + r - λ)
    Ω = rand(Distributions.Normal(0, σ_ε), SAMPLE_SPACE_SIZE)
    model = SDDP.PolicyGraph(
        SDDP.Graph(0, [1], [(0 => 1, 1.0), (1 => 1, β)]),
        sense = :Max,
        upper_bound = 1000,
        optimizer = Gurobi.Optimizer,
    ) do sp, t
        set_silent(sp)
        @variables(sp, begin
            B >= 0, (SDDP.State, initial_value = B0)
            y, (SDDP.State, initial_value = y0)
            0 <= c <= 1 / b, (SDDP.State, initial_value = 0)
            ε
            δ >= 0
        end)
        @constraints(sp, begin
            B.out == (1 + r) * B.in + y.in - c.out + δ
            y.out == (1 - λ) * ȳ + λ * y.in + ε
        end)
        SDDP.parameterize-> fix(ε, ω), sp, Ω)
        @stageobjective(sp, c.out - (b / 2) * c.out^2 - 10_000 * δ)
    end
    function closed_form_solution(ε)
        T = length(ε)
        y = zeros(T)
        c = zeros(T)
        B = zeros(T)
        c0 = r * B0 + r * θ * y0 + (1 - λ) * θ * ȳ
        for t = 1:T
            y[t] = (1 - λ^t) * ȳ + λ^t * y0 + sum^(t - j) * ε[j] for j=1:t)
            c[t] = c0 + r * θ * sum(ε[1:t])
            B[t] = B0 + (1 - λ) * θ * (y0 + sum(y[1:t-1])) - t * (1 - λ) * θ * ȳ
        end
        return B, y, c
    end
    iteration = 1
    function forward_pass_callback(trajectory)
        ε = map(x -> x[2], trajectory.scenario_path)
        Tmax = length(ε)
        cf_B, cf_y, cf_c = closed_form_solution(ε)
        plot_B = Plots.plot(
            ylabel = "B",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 200),
        )
        plot_y = Plots.plot(
            title = "Iteration $(iteration)",
            ylabel = "y",
            legend = false,
            xlim = (0, 20),
            ylim = (-3, 3),
        )
        plot_c = Plots.plot(
            ylabel = "c",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 100),
        )
        Plots.plot!(plot_B, 1:Tmax, map(x -> x[:B], trajectory.sampled_states))
        Plots.plot!(plot_B, 1:Tmax, cf_B, style = :dash)
        Plots.plot!(plot_y, 1:Tmax, map(x -> x[:y], trajectory.sampled_states))
        Plots.plot!(plot_y, 1:Tmax, cf_y, style = :dash)
        Plots.plot!(plot_c, 1:Tmax, map(x -> x[:c], trajectory.sampled_states))
        Plots.plot!(plot_c, 1:Tmax, cf_c, style = :dash)
        plt = Plots.plot(
            plot_B,
            plot_y,
            plot_c,
            layout = (1, 3),
        )
        Plots.savefig(plt, "iterations/iteration_$(1000 + iteration).png")
        iteration += 1
        return
    end
    SDDP.train(
        model;
        iteration_limit = 500,
        forward_pass_callback = forward_pass_callback,
    )
    run(`convert iterations/\*.png training.gif`)
    return
end

main()

@azev77
Copy link

azev77 commented Aug 22, 2021

I’ll take a closer look when I come home.
Are you able to plot the policy & value functions?
Like I did: As a function of B, for say 3 values of y.
I don’t need to see each iteration, just the solution

@azev77
Copy link

azev77 commented Dec 27, 2021

Hey @odow I'm trying to compare intertemporal optimization solvers in Julia here.

I coded up the (correct) finite horizon solution w/ JuMP-Ipopt:

#Finite Horizon Discrete Time w/ jump
if 1==1
    using JuMP, Ipopt, Plots    # InfiniteOpt, 
    δ=0.10; ρ=0.05; 

    # Need: z > ρ + δ
    z= ρ + δ +.10    # Case: k_T+1 =0 liquidate capital 
    #z= ρ + δ +.50    # Case: k_T+1 >0 burn some capital 

    I_SS = (z-ρ-δ)/+δ)
    K_SS = I_SS/δ

    T = 90.0   # life horizon
    k0 = 1.0 # endowment

    time_i = 0.0:1.0:T
    time_k = 0.0:1.0:T+1

    u(k,i;z=z) = z*k -i -(1/2)*(i)^2  # utility function
    d(t; ρ=ρ) = (1+ρ)^(-t)     # discount function

    m = Model(Ipopt.Optimizer)

    register(m, :u, 2, u; autodiff = true)
    register(m, :d, 1, d; autodiff = true)

    @variable(m, i[time_i]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
    @variable(m, k[time_k])

    # LOM 
    for j in 2:length(time_k)
        t_1, t = time_k[j - 1], time_k[j]
        @constraint(m, k[t] == i[t_1] + (1-δ)*k[t_1])
    end

    # Can't sell more than all your capital.
    # Equiv to K_t >=0
    for j in 1:length(time_i)
        t = time_i[j]
        @constraint(m, i[t] >= -(1-δ)*k[t])
    end

    fix(k[0.0], k0)
    # @constraint(m, i[T] == max(-1, -(1-δ)*k[T]) )
    @constraint(m, i[T] >= -1 )

    @NLobjective(m, Max, sum(d(t)*u(k[t],i[t]) for t in time_i) )

    optimize!(m)
    objective_value(m)
    i_opt, k_opt = value.(i), value.(k)

    scatter(time_k,   k_opt.data, lab = "k");
    scatter!(time_i,  i_opt.data, lab = "i")
    plot!([0.0],  seriestype = :hline, lab="", color="red")
    plot!([-(1-δ)*k_opt.data[end-1] -1],  seriestype = :hline, lab="I_T TC", color="grey", l=:dash)
    plot!([K_SS I_SS],  seriestype = :hline, lab="", color="grey", l=:dash)
    plot!([T+1],  seriestype = :vline, lab="", color="grey", l=:dash)
end 
  1. How can I solve the same finite horizon problem w/ SDDP.jl - Ipopt?
  2. How can I solve the infinite horizon version w/ SDDP.jl - Ipopt?
    PS: the infinite horizon terminal condition is: limit e^(-\rho*t) * k_t * (1+i_t) =0

@odow odow mentioned this issue Dec 14, 2022
10 tasks
@odow
Copy link
Owner Author

odow commented Dec 14, 2022

Closing for the same reason as #528 (comment).

SDDP.jl isn't really intended as a solution method for these types of problems.

@odow odow closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants