smoothed-gradient-descent.py

https://github.com/marimo-team/gallery-examples/blob/main/notebooks/algorithms/smoothed-gradient-descent.py

Problem of Hard Functions

Sometimes we want to optimise functions that are hard to optimise. Here are two examples:

$f(x) = \text{sinc}(x) \quad \text{and} \quad f(x) = \lfloor 10 \cdot \text{sinc}(x) + 4 \sin(x) \rfloor$

Both functions have multiple peaks, making them hard to optimise via gradient descent. The right function is also non-differentiable.

Idea: Add a smoothing parameter $s$ to turn the 1D problem into a 2D problem:

$g(x, s) = \int_{-\infty}^{\infty} f(t) \cdot \mathcal{N}(t; \mu=x, \sigma=s) \, dt$

When $s = 0$ : $g(x, 0) = f(x)$ . When $s \gg 0$ : $g(x, s)$ becomes a smooth average.

Intuition

How does the smoothing work? We convolve $f(x)$ with a Gaussian:

$g(x, \sigma) = \int_{-\infty}^{\infty} f(t) \cdot \mathcal{N}(t; x, \sigma) \, dt$

This integral computes a weighted average of $f$ , where points closer to $x$ get higher weight.

Use the sliders below to explore how different $(x, \sigma)$ values produce different pixel values.

The green shaded area on the left (integral) equals the pixel value 8.02 on the right.

The key insight: instead of optimising $f(x)$ directly, we optimise in the $(x, s)$ space. Starting with high smoothing, the landscape is smooth and gradients point toward the global optimum. As we reduce $s$ toward 0, we converge to the true optimum of $f(x)$ .

Gradient Field Visualization

The arrows show the gradient direction at each point. Notice how at high smoothing (top), gradients consistently point toward the global maximum at $x=0$ . At low smoothing (bottom), the gradients become chaotic with many local optima.

Gradient Descent in Smoothed Space

Key insight:

When $s \approx 0$ : $g(x, s) \approx f(x)$ (original function)
When $s \gg 0$ : $g(x, s)$ is smooth, gradients point toward global optimum region
Starting at high $s$ and descending toward $s \approx 0$ helps escape local optima