Sometimes we want to optimise functions that are hard to optimise. Here are two examples:
Both functions have multiple peaks, making them hard to optimise via gradient descent. The right function is also non-differentiable.
Idea: Add a smoothing parameter to turn the 1D problem into a 2D problem:
When : . When : becomes a smooth average.
How does the smoothing work? We convolve with a Gaussian:
This integral computes a weighted average of , where points closer to get higher weight.
Use the sliders below to explore how different values produce different pixel values.
The key insight: instead of optimising directly, we optimise in the space. Starting with high smoothing, the landscape is smooth and gradients point toward the global optimum. As we reduce toward 0, we converge to the true optimum of .
The arrows show the gradient direction at each point. Notice how at high smoothing (top), gradients consistently point toward the global maximum at . At low smoothing (bottom), the gradients become chaotic with many local optima.
Key insight: