August 7, 2023

Optimization

  • You want to find the minimum or maximum of a function.
  • Possibly subject to some constraints.

\[ \min_x f(x)\] \[ \text{s.t. } g(x) \ge 0\] - Maximizing a function \(f\) is equivalent to minimizing \(-f\).

When do we need to do this?

  • Agent’s choice problem:
    • Maximize utility subject to some constraints.
  • Estimation:
    • Ordinary Least Squares (OLS)
    • Simulated Method of Moments (SMM): Choose parameters to minimize distance between data moments and model moments.
    • Maximum Likelihood Estimation (MLE): Choose parameters to maximize likelihood of observing data.

How do we do this?

  • Pencil and paper: Take derivates, use Lagragian, …
    • This is what you did in first year.
    • Not always possible.
  • Grid search: Guess and check.
    • This is what we did last week in Lecture 2.
    • Accurate solution requires a very fine grid.
    • This can be very slow, especially in many dimensions.
  • Optimization algorithms: “Smart” guess and check.
    • Use limited information about function (such as derivatives) and previous guesses to decide next guess.
    • Lots of choices here.
    • Sometimes doesn’t work; no perfect algorithm.

Boxed constrained univariate optimization method.

  • We know solutions exists in \([a,b]\).
    • Search intensity normalized between 0 and 1
    • Hours of work between 0 and 80.
    • Savings is betwen 0 and net worth.
  • Common algorithm: Brent’s method.
  • Does not require derivatives.
  • Uses bisections, secants, and inverse quadratic interpolation…

Multivariate Optimization

  • Optimization in multiple dimension can be challenging..
    • Curse of dimensionality.
  • If function is twice differentiable and has analytic gradient/Hessian, you can use Newton’s method.
  • In Economics, you will rarely be dealing with such a nice function in your optimization.

Quasi-Newtonian Method

  • Numerically approximates gradient and Hessian and applies similar updating rule to Newton’s method.
    • Finite differences: For a small enough \(\Delta\) \[ f'(x) \approx \frac{f(x+\Delta) - f(x)}{(\Delta)} \]
    • Automatic differentiation: Computer attempts to apply chain run to components of the function.
  • Most common algorithm of this type is BFGS or (L-BFGS).
  • Problems:
    • Approximation of derivative might not be well-behaved.
    • Can be computationally expensive.

Derivative Free Methods

  • Most common method: Nelder-Mead
    • Easy to implement. Does not necessarily work well.

https://en.wikipedia.org/wiki/Nelder–Mead_method

  • Operationalizes the intuition of “rolling down hill” by updating a simplex until converges.
  • Often referred to as “downhill simplex method”.
  • Can easily be confused by local minimum.
  • Can also get stuck in flat areas or go around in circles.

Randomization

  • Randomization can improve the performance of our optimizers.
    • Many local minimum
    • Function is not smooth.
  • Basin hopping
    1. Guess initial point and run algorithm (such as NM).
    2. From candidate solution, randomly “hop” to new initial point and run algorithm again.
  • Laplace-type estimator
    • Randomly jump around parameter space.
    • Accept (with some randomization) better guesses.
    • Size of jump updates based on fraction of accepted guesses.

Tolerance

  • Many (but not all) optimization alogrithms end after reaching some preset tolerance level.
  • Lower tolerance improves quality of answer but increase time algorithm will run.
  • Important to keep nested optimization in mind.
  • Common procedure in economics:
    • Inner loop solves model given a parameter guess: make choices to maximize utility.
    • Outer loop chooses parameters to minimize difference between model and data (maximum likehood/SMM).
  • Low tolerance in outer loop may lead to bad parameter guesses.
  • Low tolerance in inner loop may lead to slower convergence.

Advice

  • Try to reduce dimensions, especially when getting started.
  • You can plot in 2D and 3D, but 4D becomes quite challenging.
  • Understand the shape of the function you are trying to optimize:
    • Is it smooth? Does it have kink points?
  • There is no perfect algorithm.
  • Transformation of variables and scale may be helpful. Log(), exp(), …

https://jblevins.org/notes/bijections