Problem of Hard Functions

Sometimes we want to optimise functions that are hard to optimise. Here are two examples:

f(x)=sinc(x)andf(x)=10sinc(x)+4sin(x)f(x) = \text{sinc}(x) \quad \text{and} \quad f(x) = \lfloor 10 \cdot \text{sinc}(x) + 4 \sin(x) \rfloor

Cell output

Both functions have multiple peaks, making them hard to optimise via gradient descent. The right function is also non-differentiable.

Idea: Add a smoothing parameter ss to turn the 1D problem into a 2D problem:

g(x,s)=f(t)N(t;μ=x,σ=s)dtg(x, s) = \int_{-\infty}^{\infty} f(t) \cdot \mathcal{N}(t; \mu=x, \sigma=s) \, dt

When s=0s = 0: g(x,0)=f(x)g(x, 0) = f(x). When s0s \gg 0: g(x,s)g(x, s) becomes a smooth average.

Cell output

Intuition

How does the smoothing work? We convolve f(x)f(x) with a Gaussian:

g(x,σ)=f(t)N(t;x,σ)dtg(x, \sigma) = \int_{-\infty}^{\infty} f(t) \cdot \mathcal{N}(t; x, \sigma) \, dt

This integral computes a weighted average of ff, where points closer to xx get higher weight.

Use the sliders below to explore how different (x,σ)(x, \sigma) values produce different pixel values.

The green shaded area on the left (integral) equals the pixel value 8.02 on the right.

The key insight: instead of optimising f(x)f(x) directly, we optimise in the (x,s)(x, s) space. Starting with high smoothing, the landscape is smooth and gradients point toward the global optimum. As we reduce ss toward 0, we converge to the true optimum of f(x)f(x).

Gradient Field Visualization

The arrows show the gradient direction at each point. Notice how at high smoothing (top), gradients consistently point toward the global maximum at x=0x=0. At low smoothing (bottom), the gradients become chaotic with many local optima.

Cell output

Gradient Descent in Smoothed Space

Key insight:

  • When s0s \approx 0: g(x,s)f(x)g(x, s) \approx f(x) (original function)
  • When s0s \gg 0: g(x,s)g(x, s) is smooth, gradients point toward global optimum region
  • Starting at high ss and descending toward s0s \approx 0 helps escape local optima
Cell output