Dynamic Programming#

Let’s start by introducing dynamic programming. For this, we will use a simple problem called cake eating. We will use dynamic programming because there are problems where “things become much more complicated as soon as the environment in which we are searching for optimal decisions becomes uncertain.

We will use the cake eating problem because it is simple and has an analytical solution. We begin with an agent who has a certain resource (cake) that has a size a0. In each period, the agent decides how much to take from this cake and consume, ct. The budget constraint will be:

a0t=0ct

We can assume that the agent will consume all their resources, therefore: a0=t=0ct. The total utility of the agent as of time t=0 should be given by

U0=t=0βtu(ct).

Using a CRRA utility u(ct)=ct1γ111γ, where γ is the intertemporal elasticity of substitution. The dynamic optimization problem can be write as:

max{ct}t=0t=0βtct1γ111γsubject toa0=t=0ct.

The all-in-one solution is to try to to find the complete path of consumption {ct}t=0 in one step. Using the Lagrangean:

L=t=0βtct1γ111γ+λ(a0t=0ct).

The FOC:

Lct=βtct1γλ=0for allt=0,1,2,,.

Using the FOC

βtct1γ=βt1ct11γ==c01γ.
ct=βtct1==βtc0.

Then, we can substitute ct in the constraint a0=t=0ct, and obtain:

a0=t=0βtc0=c0t=0(βt)=c01β.

Finally, the consumption path is

ct=βtγ(1βγ)a0.
using Plots, Parameters

function params(; γ = 0.5,
    β = 0.95,
    a0 = 100.0,
    TT = 200)

    return (γ=γ, β=β, a0=a0, TT=TT)
end

function consumption!(p, c_t)
    @unpack TT, β, γ, a0 = p

    # Calculate the time path of consumption
    for it in 1:TT
        c_t[it] = β^(it * γ) * (1.0 - β^γ) * a0
    end
end

p = params(a0=100)
c_t = zeros(p.TT + 1)
consumption!(p, c_t)

# Plotting
plot(c_t, xlabel="Time t", ylabel="Consumption c_t", legend=false)

Solving the cake eating problem analytically is fortunately straightforward. However, when there is no analytical solution, we can tackle the exercise as in previous classes, using constrained/unconstrained optimization or a system of equations with the FOCs.

This problem can become tremendously large and computationally complex, which is why we use different approaches. We will start by presenting the dynamic problem, then we will introduce the techniques of Value Function Iteration and Policy Function.

Note

Fehr and Kindermann (2018) show how the problem is independent of time t, but only depends on the amount of resources at​. The sub-optimization problem:

max{cv}v=tUt=v=tβvtu(cv)s.t.at=v=tcv

The dynamic problem can be written as follows, where V is the value function:

max{cv}v=tUts.t.at=v=tcv
=v=tβvtu((u)1[βtvu(f(at))])
=v=0βvu((u)1[βvu(f(at))])=:V(at)

In consequence, the maximum amount of utility the agent can derive from time t onwards under the assumption that only an amount of at of his resources are left is independent of time t and can be subsumed in the function V(at) which only depends on the amount of remaining resources.

The Bellman equation

V(at)=maxct{u(ct)+βV(at+1)}s.t.at+1=atct.

can be expressed without t to avoid confusion between the actual path of consumption and the general solution to the dynamic program. $V(a)=maxc{u(c)+βV(a+)}s.t.a+=ac.$

The solution V is a fixpoint of the functional equation V(a). c is the control variable (controls the reduction of the resource) and a is the state variable (describes the state of the dynamic problem at time t). The state variable connects the different periods.

The analytical solution $L=u(c)+βV(a+)+λ[aca+].$

Lc=u(c)λ=0andLa+=βV(a+)λ=0
u(c)=βV(a+).

with a few steps, we get

a+=aβγandc=a(1βγ).
function params(; γ = 0.5,
    β = 0.95,
    a0 = 100.0,
    TT = 200,
    NA = 1000)
     = 1.0 - 1.0 / γ
    a = collect(range(1, 100, NA))
    return (γ=γ, β=β, a0=a0, TT=TT, =, NA=NA, a=a)
end

function dynamics(p)
    @unpack TT, a0, β, γ = p 

    c_t = zeros(TT + 1)
    a_t = zeros(TT + 1)
    # Calculate the time path of consumption
    a_t[1] = a0
    c_t[1] = a_t[1] * (1.0 - β^γ)
    for it in 2:TT+1
        a_t[it] = a_t[it-1] - c_t[it-1]
        c_t[it] = a_t[it] * (1.0 - β^γ)
    end

    return a_t, c_t
end


p = params()
a_t, c_t = dynamics(p)
plot(a_t)
plot(c_t)
function params(; γ = 0.5,
    β = 0.95,
    a0 = 100.0,
    TT = 200,
    NA = 1000)
     = 1.0 - 1.0 / γ
    a = collect(range(1, 100, NA))

    c_t = zeros(TT + 1)
    a_t = zeros(TT + 1)

    ap = zeros(NA)
    c = zeros(NA)
    V = zeros(NA)

    return (γ=γ, β=β, a0=a0, TT=TT, =, NA=NA, a=a, c_t=c_t, a_t=a_t, ap=ap, c=c, V=V)
end


function dynamics!(p)
    @unpack TT, a0, β, γ,c_t, a_t = p 

    # Calculate the time path of consumption
    a_t[1] = a0
    c_t[1] = a_t[1] * (1.0 - β^γ)
    for it in 2:TT+1
        a_t[it] = a_t[it-1] - c_t[it-1]
        c_t[it] = a_t[it] * (1.0 - β^γ)
    end

end

p = params()
dynamics!(p)

p1 = plot(p.c_t, xlabel="Time t", ylabel="Consumption", legend=false)
p2 = plot(p.a_t, xlabel="Time t", ylabel="Savings", legend=false)
plot(p1,p2, size=(800,400))
a+=aβγandc=a(1βγ).
function policy!(p)
    @unpack NA, a0, β, γ, , a, ap, c, V = p 

    # Calculate and plot policy and value function
    for ia in 1:NA
        ap[ia] = a[ia]*β^γ
        c[ia] = a[ia] * (1.0 - β^γ)
        V[ia] = ((1.0 - β^γ)^(-1.0 / γ)) * a[ia]^ / 
    end
end

policy!(p)
va = p.V
# Plot policy function
p1 = plot(p.a, p.c, xlabel="Level of resources a", ylabel="Policy Function c(a)", legend=false)

# Plot value function
p2 = plot(p.a, p.V, xlabel="Level of Resources a", ylabel="Value Function V(a)", legend=false)

#Plot future saving
p3 = plot(p.a, p.ap, xlabel="Level of Resources a", ylabel="Saving a+", legend=false)
plot(p1, p2, p3)

Solution by value function iteration#

In this section, we will solve the cake eating problem using Value Function Iteration (VFI) in the style of Fehr and Kindermann (2018) and Quantecon.

First of all, to solve this problem, we face the difficulty that the computer cannot handle arbitrary functions, as a function is an infinite-dimensional object. The computer cannot solve a dynamic programming problem on a continuous state space, e.g. a[0,a0].

We need to discretize the state space a in a grid. For example, can be discretized in a equidistant grid a^=a0νn for ν=0,1,,n.

The dynamic programming problem

V(a^v)=maxc{u(c)+βV(a+)}s.t.a+=a^vcfor all v=0,1,,n.

using the constraint directly into the function:

V(a^v)=maxa+{u(a^va+)+βV(a+)}for v=0,1,,n.

Then we have to solve a set of decisions {c(a^v)}v=0n and {a+(a^v)}v=0n (policy functions) and utility {V(a^v)}v=0n

There are two relevant points when solving this optimization problem:

  1. How can we determine the value function V(a+) -> value function iteration.

  2. Knowing the value function V(a+) , how can we come up with a solution to the optimization problems.

The algorithm follows the fix point iteration:

  1. Discretize the state space a in a grid av^.

  2. Initial guess of V0, e.g. V(a)=0.

  3. Determinal the optimal policies {c(a^v)}v=0n and {a+(a^v)}v=0n for all a^v.

  4. Get the resulting value function Vnew(a^v)=u(c(a^v))+βV(a+(a^v))for v=0,1,,n.

  5. Find the fixed point of the dynamic program -> check the criteria maxa^v|Vnew(a^v)V(a^v)V(a^v)|<ϵ. If the criteriaϵ, update V(a) using Vnew(a) and go back to step 2.

function params(; γ = 0.5,
    β = 0.95,
    a0 = 100.0,
    TT = 200,
    NA = 1000, 
    crit = 1e-4,
    itermax = 2000, 
    a_min  = 0.0,
    a_max = a0)

     = 1.0 - 1.0 / γ
    a = collect(range(a_min, a_max, NA))

    ia_opt = zeros(NA)
    u_temp = zeros(NA)    

    return (γ=γ, β=β, a0=a0, TT=TT, =, NA=NA, a=a, crit=crit, itermax=itermax, ia_opt=ia_opt, u_temp=u_temp)
end

function u(p, c)
    @unpack , β = p
    return c^ / 
end

function bellman!(p,v)
    @unpack NA, β, a, ia_opt, u_temp = p 
    ia_opt[1] = 1    
    v[1] = v[2] - 100.0
    # Calculate optimal decision for every grid point
    for ia in 2:NA
        for ia_p in 1:max(ia-1, 1)
            # Calculate consumption
            c_temp = max(a[ia] - a[ia_p], 1e-10)
            # Calculate utility
            u_temp[ia_p] = u(p, c_temp) + β * v[ia_p]
        end
        #Maximization
        V_max, index_max = findmax(u_temp[1:max(ia-1,1)])
        v[ia] = V_max
    end    

end

function VFI(p, v0)
    @unpack crit, itermax, a, NA, β, ia_opt, u_temp = p 
    v = copy(v0)
    v_new = copy(v0)

    # Iterate until value function converges
    for iter in 1:itermax
        #Update value
        v = copy(v_new)
        #New value v_new
        bellman!(p, v_new) 
        # Get convergence level
        valuediff = maximum(abs.(v_new .- v))
        if iter%50==0
            println("Iter: ", iter, " Convergence Level: ", valuediff)
        end
        # Check for convergence
        if valuediff < crit
            println("Covergence is Ok: (iter, crit)", (iter, valuediff))
            break
        end

        if iter == itermax
            println("Not convergence")
        end
    end

    return v
end

p = params()
v0 = zeros(p.NA)
@time v = VFI(p,v0);

plot(p.a, v, label="Optimal value", xlabel="a", ylabel="v(a)", color=:blue)
plot!(p.a, va, label="Analytical", color=:red)
plot!(p.a, v0, label="Initial value", color=:green)
Iter: 50 Convergence Level: 8.951194409198933
Iter: 100 Convergence Level: 0.68874943251285
Iter: 150 Convergence Level: 0.05299580805649384
Iter: 200 Convergence Level: 0.004077761140706571
Iter: 250 Convergence Level: 0.00031376323022414
Covergence is Ok: (iter, crit)(273, 9.643728344599367e-5)
  1.568990 seconds (560.19 k allocations: 1.076 GiB, 16.11% gc time, 9.82% compilation time)

Solve VFI using interpolation#

V(a^v)=maxc{u(c)+βV(a+)}s.t.a+=a^vcfor all v=0,1,,n.

The algorithm follows the fix point iteration:

  1. Discretize the state space a in a grid av^.

  2. Initial guess of V0, e.g. V(a)=0.

  3. Get the resulting value function Vnew(a^v) using interpolation of V(a+)

  4. Find the fixed point of the dynamic program -> check the criteria maxa^v|Vnew(a^v)V(a^v)|<ϵ.. If the criteriaϵ, update V(av^) using Vnew(av^) and go back to step 2.

using Optim
using Interpolations

function params(;
        β=0.96, 
        γ=1.5, 
        a_grid_min=1e-3, 
        a_grid_max=2.5, 
        a_grid_size=120, 
        crit=1e-4,
        itermax=200)

    a_grid = range(a_grid_min, stop=a_grid_max, length=a_grid_size)
    return (β=β, γ=γ, a_grid=a_grid, crit=crit, itermax=itermax)
end

# Utility function
function u(p, c)
    @unpack γ = p
    if γ == 1.0
        return log.(c)
    else
        return (c.^(1 - γ)) ./ (1 - γ)
    end
end

function state_action_value(p, c, x, v_array)
    @unpack a_grid, β = p
    vp = LinearInterpolation(a_grid, v_array, extrapolation_bc = Line())  
    return -(u(p, c) + β * vp(x - c))
end

function T(v, p)
    @unpack a_grid = p
    v_new = similar(v)
    for (i, x) in enumerate(a_grid)
        objective = c -> state_action_value(p, c, x, v)
        result = optimize(objective, 1e-10, x) 
        v_new[i] = -Optim.minimum(result)
    end
    return v_new
end

p = params()
a_grid = p.a_grid
v = u(p,a_grid)       # Initial guess
n = 20                 # Number of iterations

p1 = plot(a_grid, v, label="Initial guess", color=:red)
for i in 1:n
    v = T(v, p)  # Apply the Bellman operator
    p1 = plot!(a_grid, v, xlabel="a", ylabel="v(a)", label="", color=:blue, alpha=(i/n))
end

plot(p1)
function VFI(p, v0)
    @unpack crit, itermax = p

    # Set up loop
    v = copy(v0)
    for i in 1:itermax
        v_new = T(v, p)
        vdiff = maximum(abs.(v - v_new))

        if vdiff<crit
            print("Convergence ok, iter:", i)
            break
        end

        if i%25==0
            println("iter, vdiff: ", (i, vdiff))
        end
        
        if i==itermax
            println("Not convergece")
        end

        #Update v0
        v = copy(v_new)
    end

    return v
end


p = params(itermax=1000)
a_grid = p.a_grid
v0 = u(p,a_grid)       # Initial guess
v_new = VFI(p, v0)

plot(a_grid, v0, xlabel="a", ylabel="v(a)", label="Initial guess", color=:red)
plot!(a_grid, v_new, label="Optimal value", color=:blue)
iter, vdiff: (25, 27.315312306314127)
iter, vdiff: (50, 10.098330629646625)
iter, vdiff: (75, 3.661005003941682)
iter, vdiff: (100, 1.321281965741946)
iter, vdiff: (125, 0.4763474203921305)
iter, vdiff: (150, 0.17168805355163386)
iter, vdiff: (175, 0.061877023913211815)
iter, vdiff: (200, 0.022300381327113428)
iter, vdiff: (225, 0.008036993313453422)
iter, vdiff: (250, 0.002896506791785214)
iter, vdiff: (275, 0.0010438916062867065)
iter, vdiff: (300, 0.0003762151136470493)
iter, vdiff: (325, 0.00013558669229496445)
Convergence ok, iter:
333
function v_star(p, x)
    @unpack β, γ = p
    return (1 - β ^ (1 / γ)) ^ (-γ) .* (x .^ (1 - γ) / (1 - γ))
end

vs = v_star(p,a_grid)

plot(a_grid[2:end], vs[2:end], xlabel="a", ylabel="v(a)", label="analytical", color=:red)
plot!(a_grid[2:end], v_new[2:end], xlabel="a", ylabel="v(a)", label="optimal value", color=:blue)

Homework#

Get the policy functions

Analytically: $σ(x)=(1β1/γ)x$

Solution using the optimal value function $σ(x)=argmax0cx{u(c)+βu(xc)}$

Solution by policy function iteration#

The idea is solve the policy function:

σ(x)=argmax0cx{u(c)+βu(xc)}

The algorithm is

  1. Start with initial guess of the policy function, e.g. c(a)=a2

  2. Determine the optimal policies {cnew(a^v)}v=0n for all a^v by solving the set of FOC: u(c(a^v))=βu(c(a+)) , where a+=f(a^v,c(a^v)). We get a+ by interpolation.

  3. Check the convergence of maxa^v|cnew(a^v)c(a^v)|<ϵ.

  4. If the criteriaϵ, update c(a) using cnew(a) and go back to step 2.

using LinearAlgebra, Roots #NLsolve, 

function params(;
    β=0.96, 
    γ=1.5, 
    a_grid_min=1e-3, 
    a_grid_max=2.5, 
    a_grid_size=120, 
    crit=1e-4,
    itermax=200)

    a_grid = range(a_grid_min, stop=a_grid_max, length=a_grid_size)
    return (β=β, γ=γ, a_grid=a_grid, crit=crit, itermax=itermax)
end

function u_prime(p, c)
    @unpack γ = p
    if c<0 
        return -6666
    else
        return c^(-γ)
    end
end

function euler_diff(p, c, x, σ)
    @unpack β = p
    # c = c[1]
    cp = σ(x - c)
    return u_prime(p,c) - β * u_prime(p, cp)
end

function K(σ0, p)
    @unpack β, a_grid = p

    σ_new = similar(σ0)
    σ = LinearInterpolation(a_grid, σ0, extrapolation_bc = Line())  
    # σ = x -> LinearAlgebra.interp(x, x_grid, σ_array)
    for (i, x) in enumerate(a_grid)
        # handle small x separately --- helps numerical stability
        if x < 1e-12
            σ_new[i] = 0.0
        # handle other x
        else
            # res = nlsolve(c -> euler_diff(p,c,x,σ), [σ0[2]], autodiff = :forward) # 
            res = find_zero(c -> euler_diff(p,c,x,σ), (1e-10, x), Bisection())
            σ_new[i] = res#.zero[1]
        end
    end

    return σ_new
end

function PFI(p, σ0)
    @unpack crit, itermax = p

    # Set up loop
    σ = copy(σ0)
    for i in 1:itermax
        σ_new = K(σ, p)
        σdiff = maximum(abs.(σ - σ_new))

        if σdiff<crit
            print("Convergence ok, iter:", i)
            break
        end

        if i%25==0
            println("iter, vdiff: ", (i, σdiff))
        end
        
        if i==itermax
            println("Not convergece")
        end

        #Update σ
        σ = copy(σ_new)
    end

    return σ
end

p = params(a_grid_min=1e-3)
a_grid = p.a_grid
c = a_grid
p1 = plot(a_grid, c, label="init")
for i in 1:15
    c = K(c, p)
    p1 = plot!(a_grid,c, label="")
end


ca(p, a) = (1 - p.β^(1/p.γ)) .* a
c_an = ca(p, a_grid)

c = PFI(p, c_an./2)

plot(a_grid, c, label="solution")
plot!(a_grid,c_an, label="analytical")
iter, vdiff: (25, 0.0004094352733778131)
iter, vdiff: (50, 0.0002991361693477079)
iter, vdiff: (75, 0.0001877976451882049)
iter, vdiff: (100, 0.00010704894602843462)
Convergence ok, iter:103

Homework#

  1. Given the solution of the policy function c(a^v), obtain a+.

  2. Get the optimal value function.

  3. Compare the result c,a+,V of VFI with index, VFI with interpolation, and PFI.