-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples of Stochastic Optimal Control #429
Comments
Ideally, one of @pulsipher's examples from InfiniteOpt: https://github.com/pulsipher/InfiniteOpt.jl/tree/master/docs/src/examples |
If anyone reading this has nice examples of an optimal control type problem they want to share, please reach out. |
The stochastic optimal economic control problem I recently setup with @azev77 in infiniteopt/InfiniteOpt.jl#131 (comment) (look at the conversation starting from this comment) might be a good fit. Naturally, the difference here is that you would want to implement its discrete version. |
Thanks for the pointer! Here's the SDDP.jl equivalent using SDDP
import Distributions
import Gurobi
import Plots
function main(
ρ = 0.020, # discount rate
k = 90.0, # utility bliss point
T = 10.0, # life horizon
r = 0.05, # interest rate
dt = 0.1, # time step size
B0 = 100.0, # endowment
μ = 0.0, # Brownian motion
σ = 0.1,
)
env = Gurobi.Env()
model = SDDP.LinearPolicyGraph(
stages = round(Int, T / dt),
sense = :Max,
upper_bound = 0,
optimizer = () -> Gurobi.Optimizer(env),
) do sp, t
set_silent(sp)
@variable(sp, B >= 0, SDDP.State, initial_value = B0)
@variable(sp, y, SDDP.State, initial_value = 10)
@variable(sp, 0 <= c <= 99)
@constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
@constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
end
@stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
end
SDDP.train(
model;
iteration_limit = 100,
)
sims = SDDP.simulate(model, 100, [:B, :c, :y]);
function plot_simulations(sims)
plot = Plots.plot()
plot_simulation!(
sims[1];
labels = ["B: wealth balance" "c: consumption" "y"],
)
for s in sims[2:end]
plot_simulation!(s, labels = false)
end
return plot
end
function plot_simulation!(sim; kwargs...)
Plots.plot!(
dt:dt:T,
[
map(data -> data[:B].out, sim),
map(data -> data[:c], sim),
map(data -> data[:y].out, sim),
],
color = [:blue :red :green];
kwargs...
)
end
return model, plot_simulations(sims)
end
model, plt = main() |
@odow thanks!
Here is my discrete time (non-stochastic, y_t=0) code: using JuMP, Ipopt, Plots # InfiniteOpt,
ρ = 0.020 # discount rate
k = 100.0 # utility bliss point
T = 10.0 # life horizon
r = 0.05 # interest rate
B0 = 100.0 # endowment
time_c = 0.0:1.0:T
time_B = 0.0:1.0:T+1
u(c; k=k) = -(c - k)^2.0 # utility function
d(t; ρ=ρ) = exp(-ρ*t) # discount function
m = Model(Ipopt.Optimizer)
register(m, :u, 1, u; autodiff = true)
register(m, :d, 1, d; autodiff = true)
@variable(m, c[time_c]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0] <= 99.0)
@variable(m, B[time_B])
fix(B[0.0], B0)
fix(B[T+1], 0.0)
@NLobjective(m, Max, sum(d(t)*u(c[t]) for t in time_c) )
#@NLobjective(m, Max, sum(exp(-ρ*t)*(-1.0)*(c[t]-k)^2.0 for t in 0.0:1.0:10.0) )
for i in 2:length(time_B)
t_1, t = time_B[i - 1], time_B[i]
@constraint(m, B[t] == B[t_1] + 1.0 * (r * B[t_1] - c[t_1]))
end
optimize!(m)
objective_value(m)
c_opt, B_opt = value.(c), value.(B)
scatter(time_B, B_opt.data, lab = "B: wealth balance");
scatter!(time_c, c_opt.data, lab = "c: consumption") |
But in this example we are not credit constrained. To clarify my question in the other issue: in this solution, does a particular I'm guessing by the solution and that the answer is "we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear." SDDP.jl does better than this by taking into account that the agent at time t cannot know the future of times t+1 and so on. |
Can you resolve the problem w/o B>=0? I think this problem actually has a closed form solution. I will take a look when I get home |
It will be the same answer because the wealth constraint is never binding.
This makes me much more sure that
is true. I think we're talking about different problems here with a different set of assumptions. |
Can you re-write using Ipopt.jl so I can solve on my computer? |
Replace |
I don't think we need to solve problems though, we need to answer the conceptual question:
|
At t=0, next y1 is realized Next y2 is realized you choose ct after observing yt, before y_t+1 |
Okay, we on the same page then. But that's not what your discrete-time Ipopt code above does? |
My discrete time Ipopt code is non-stochastic, y_t=0, for all time periods, so it’s perfect foresight |
Yeah haha I'm getting confused with the code across the two issues. I meant the code that produced the figures with the straight lines. |
Below is my matrix the the 4 types of intertemporal optimization problems we solve in Econ. The only sol so far I don't understand is @pulsipher's stochastic continuous-time. Take a look here: JuliaPOMDP/POMDPs.jl#351 |
SDDP.jl can solve the discrete-time deterministic and stochastic cases, for both the finite horizon and the infinite horizon cases. For the infinite case, you just need to replace the graph = SDDP.LinearGraph(1)
SDDP.add_edge(graph, 1 => 1, 1 - ρ)
model = SDDP.PolicyGraph(
graph,
sense = :Max,
upper_bound = 0,
optimizer = () -> Gurobi.Optimizer(env),
) do sp, t Here's a tutorial on the theory, which outlines the bellman recursion etc: https://odow.github.io/SDDP.jl/stable/tutorial/theory/21_theory_intro/ It's not clear to me that your notation for the stochastic cases is correct.
It's another one of those "MDP solvers are all alike in different ways". It uses linear programming duality to construct the value function. You could say that it is value function approximation ala Approximate Dynamic Programming. You could also say that it is Q-learning in the POMDP world, but we converge to a provably optimal solution and it takes into account more than one-step into the future. There are plotting tools to visualize the resulting value function: https://odow.github.io/SDDP.jl/stable/tutorial/basic/05_plotting/#Plotting-the-value-function The best thing about SDDP is that the subproblems are JuMP models, so you're free to add as many variables and constraints as you like. We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints. |
If what you're saying is true (& works for problems w/ non-linear reward functions like Check out VFIToolkit.m. As it currently stands the docs in packages like |
This is what @bernardokp has been telling me for some time... #428 There are some issues with nonlinear functions: everything must be convex. So you could have log rewards (if maximizing) but not log costs (if maximizing). There are some crappy finance/inventory problems, but not enough: @lorenzoreus has some finance examples. Happy to accept PRs with examples and/or documentation. But I don't have much interest (or domain expertise) to write them myself. |
Here's a model that @jd-lara had been running We can also find risk-averse policies, so swap out that E operator for your choice of convex risk measure. We also have the ability to solve problems where the randomness is a hidden Markov model (e.g., bear and bull markets with random returns, where the returns are observed but the bear/bull state is hidden). I should tidy the examples/tutorials to clarify all of this. |
Is it possible to show me how to solve the example you did above (discrete time version of my model) w/ an open-source solver? PS: wouldn't you agree the world would be a much better place if more people from different fields cooperated & submitted bug-reports & PRs to the same packages? (I sure do...) using SDDP
import Distributions
import Gurobi
import Plots
function main(
ρ = 0.020, # discount rate
k = 90.0, # utility bliss point
T = 10.0, # life horizon
r = 0.05, # interest rate
dt = 0.1, # time step size
B0 = 100.0, # endowment
μ = 0.0, # Brownian motion
σ = 0.1,
)
env = Gurobi.Env()
model = SDDP.LinearPolicyGraph(
stages = round(Int, T / dt),
sense = :Max,
upper_bound = 0,
optimizer = () -> Gurobi.Optimizer(env),
) do sp, t
set_silent(sp)
@variable(sp, B >= 0, SDDP.State, initial_value = B0)
@variable(sp, y, SDDP.State, initial_value = 10)
@variable(sp, 0 <= c <= 99)
@constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
@constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
end
@stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
end
SDDP.train(
model;
iteration_limit = 100,
)
sims = SDDP.simulate(model, 100, [:B, :c, :y]);
function plot_simulations(sims)
plot = Plots.plot()
plot_simulation!(
sims[1];
labels = ["B: wealth balance" "c: consumption" "y"],
)
for s in sims[2:end]
plot_simulation!(s, labels = false)
end
return plot
end
function plot_simulation!(sim; kwargs...)
Plots.plot!(
dt:dt:T,
[
map(data -> data[:B].out, sim),
map(data -> data[:c], sim),
map(data -> data[:y].out, sim),
],
color = [:blue :red :green];
kwargs...
)
end
return model, plot_simulations(sims)
end
model, plt = main() |
Even though this problem is small, it still requires on the order of 10^4 QP solves. So having a good solver is critical. Get Gurobi. They have free licenses for academics.
Yes. I'll say this publicly: I don't believe the stochastic programming community has done a good job of publicizing the technology we have. There were three main issues:
Here's some good primers on the industrial problem: |
Got it to work w/ Ipopt. |
Yeah it's probably very slow and might crash/warn about numerical instability? |
It stops at the I just worked out the closed form solution for the problem above (infinite horizon version w/ AR(1) income). If I had Gurobi, I'd do it on my own. Also, can @pulsipher I messed up in explaining the stochastic problem to you before. PS: It would be interesting to see how your package solves this model where income follows an AR(1) process. PS: I'd be happy to submit a PR w/ an example (like I did w/ InfiniteOpt) if you provide the code/output
|
This notation looks better. Nice to see that I'll see if I have time later to implement it. It would make a very nice tutorial. |
There are a few typos in my notes above, please hold-off until I fix them... |
I’ll take a closer look when I come home. |
Hey @odow I'm trying to compare intertemporal optimization solvers in Julia here. I coded up the (correct) finite horizon solution w/ JuMP-Ipopt: #Finite Horizon Discrete Time w/ jump
if 1==1
using JuMP, Ipopt, Plots # InfiniteOpt,
δ=0.10; ρ=0.05;
# Need: z > ρ + δ
z= ρ + δ +.10 # Case: k_T+1 =0 liquidate capital
#z= ρ + δ +.50 # Case: k_T+1 >0 burn some capital
I_SS = (z-ρ-δ)/(ρ+δ)
K_SS = I_SS/δ
T = 90.0 # life horizon
k0 = 1.0 # endowment
time_i = 0.0:1.0:T
time_k = 0.0:1.0:T+1
u(k,i;z=z) = z*k -i -(1/2)*(i)^2 # utility function
d(t; ρ=ρ) = (1+ρ)^(-t) # discount function
m = Model(Ipopt.Optimizer)
register(m, :u, 2, u; autodiff = true)
register(m, :d, 1, d; autodiff = true)
@variable(m, i[time_i]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0] <= 99.0)
@variable(m, k[time_k])
# LOM
for j in 2:length(time_k)
t_1, t = time_k[j - 1], time_k[j]
@constraint(m, k[t] == i[t_1] + (1-δ)*k[t_1])
end
# Can't sell more than all your capital.
# Equiv to K_t >=0
for j in 1:length(time_i)
t = time_i[j]
@constraint(m, i[t] >= -(1-δ)*k[t])
end
fix(k[0.0], k0)
# @constraint(m, i[T] == max(-1, -(1-δ)*k[T]) )
@constraint(m, i[T] >= -1 )
@NLobjective(m, Max, sum(d(t)*u(k[t],i[t]) for t in time_i) )
optimize!(m)
objective_value(m)
i_opt, k_opt = value.(i), value.(k)
scatter(time_k, k_opt.data, lab = "k");
scatter!(time_i, i_opt.data, lab = "i")
plot!([0.0], seriestype = :hline, lab="", color="red")
plot!([-(1-δ)*k_opt.data[end-1] -1], seriestype = :hline, lab="I_T TC", color="grey", l=:dash)
plot!([K_SS I_SS], seriestype = :hline, lab="", color="grey", l=:dash)
plot!([T+1], seriestype = :vline, lab="", color="grey", l=:dash)
end
|
Closing for the same reason as #528 (comment). SDDP.jl isn't really intended as a solution method for these types of problems. |
For the INFORMS talk
The text was updated successfully, but these errors were encountered: