Homework 2

Submit link: https://classroom.github.com/a/gYCf0Llh

Question 1

The Himmelblau function is of the form

\[ f(x, y)=\left(x^{2}+y-11\right)^{2}+\left(x+y^{2}-7\right)^{2} . \]

The critical points of this function can be found analytically, but doing so is not recommended. Instead, we will find the minima using optimizers.

Produce a surface plot of the function over the domain \(x, y \in[-4,4]\). How many local minima do there appear to be? Note: playing with the “camera()” option with Julia plots allows you to alter the angle from which you view the surface plot.
Solve for the gradient and Hessian of the function, and optimize it using Newton’s method. Comment on how the answer you arrive at depends on the initial guess fed to the algorithm.
Now ignore the derivatives and optimize the function using Nelder-Mead. Compare the number of iterations required to arrive at local minima compared to Newton’s method.

Question 2

The Ackley function is a non-convex function used as a performance test problem for optimization algorithms. The function is defined by

\[ f(x, y)=-20 \exp \left[-0.2 \sqrt{0.5 \cdot\left(x^{2}+y^{2}\right.}\right]-\exp [0.5 \cdot(\cos 2 \pi x+\cos 2 \pi y)]+e+20 \]

Produce surface and contour plots of the Ackley function over the domain \(x, y \in[-4,4]\). What is the global minimum?
Minimize the function using LBFGS and Nelder-Mead. Experiment with how the performance of the algorithms responds to the starting guess fed to them.

Question 3

The Rastrigin function is a particularly absurd function used for testing optimization algorithms. On an n-dimensional domain, the problem is defined by

\[ f(\boldsymbol{x})=A n+\sum_{i=1}^{n}\left[x_{i}^{2}-A \cos \left(2 \pi x_{i}\right)\right], \]

where \(A=10\) and \(x_{i} \in[-5.12,5.12]\).

Plot the function for \(n=1\). What is the global minimum? Does this minimum hold for any arbitrary \(n\) ?
Produce surface and contour plots for the Rastrigin function for \(n=2\).
Minimize the function using LBFGS and Nelder-Mead. Experiment with how the performance of the algorithms responds to the starting guess fed to them.

Question 4

Write a function that returns the linear approximation of a function passed to it. The function should take as arguments:

A function \(f\) mapping some interval \([a, b]\) into \(\mathbb{R}\).
Two scalars \(a, b\) providing the limits of this interval.
An integer \(n\) determining the number of grid points.
A number \(x\) satisfying \(a \leq x \leq b\).

The function should return the piecewise linear interpolation of \(f\) at \(x\). Hint: look up the findfirst function and use it to define a function that locates the index of the first element of an array that is bigger than \(x\).

Question 5

In this problem, we consider how to optimally approximate a simple function using linear interpolation. Say that we have the function \(f(x)=\ln (x+1)\), with \(x \in[0,100]\). Our goal is to form the best possible approximation of this function with linear interpolation while using only a fixed number of grid points. Specifically, write a function that:

Inputs a discretization of the \([0,100]\) range. The first element should be zero, the final element 100, and there should be \(n=9\) grid points in between. Generate an approximation \(\hat{f}(x)\) by evaluating \(f(x)\) at the grid points and then defining a linear interpolation of the discretized function.
Evaluates the fit of this approximation by evaluating \(|\hat{f}(x)-f(x)|\) from \(x=[0,100]\) in increments of 0.1. Returns vectors containing \(f(x), \hat{f(x)}\), and the approximation error \(|\hat{f}(x)-f(x)|\) for each 0.1 increment from 0 to 100 .

Start with an evenly spaced grid (i.e. 0, 10, 20 . . .). Plot \(f(x), \hat{f}(x)\), and the approximation error. Where is the approximation error the highest? Interpret.
Now write a function that admits a vector of 9 points between 0 and 100. The function should then add on 0 and 100 to either end of the vector to form a new discretization of the domain of \(x\) before running the function above for the new grid. The output of the function should be the total approximation error (i.e. the sum of all \(|\hat{f}(x)-f(x)|\) ).
Using Nelder-Mead with a starting guess of evenly-spaced points, find the optimal point placement for minimizing the total approximation error of the interpolated function. What are the selected grid points? By how much does the total approximation error decrease? Plot the new grid of approximation errors, \(f(x)\), and \(\hat{f}(x)\). Note: you may need to include an if-statement that explicitly rules out negative grid points to prevent Nelder-Mead from behaving stupidly.