Evolutionary Strategies (ES) don't just find good solutions - they learn how to search.
The key insight: ES maintains both a mean (where to search) and a standard deviation (how wide to search). Both parameters adapt based on what the algorithm discovers.
This notebook lets you watch adapt in real-time across different challenging optimization landscapes.
In ES, we sample candidate solutions from a Gaussian distribution:
Dimensionality in this notebook:
The magic of ES is that both and adapt based on what we find:
After sampling candidates and evaluating their fitness, we update :
Every term explained:
| Symbol | Meaning |
|---|---|
| The change to apply to this iteration | |
| Learning rate for (typically 0.01–0.1). Controls how fast adapts | |
| Population size — number of samples per iteration | |
| The -th sampled candidate (a 2D point in this notebook) | |
| Current mean of the search distribution (2D vector) | |
| Squared Euclidean distance from sample to mean | |
| Current standard deviation (scalar) | |
| Normalized fitness of sample (see below) |
What is (normalized fitness)?
We normalize the raw fitness values to have zero mean and unit variance:
where is the mean fitness across all samples.
For minimization, we flip the sign: , so lower → higher .
Why does this work?
The term measures how "far" a sample is relative to the current :
Multiplying by creates a correlation signal:
The mean moves toward regions where good solutions were found:
Every term explained:
| Symbol | Meaning |
|---|---|
| The change to apply to this iteration (a 2D vector) | |
| Learning rate for (typically 0.1–1.0). Controls step size | |
| Population size | |
| Normalized fitness of sample (same as in update) | |
| Direction vector from current mean to sample | |
| Current standard deviation (normalizes the step) |
How it works:
Each sample "votes" for the direction with weight :
The averaging over samples creates a gradient-like signal pointing toward better regions — but computed entirely from function evaluations, no actual gradients needed!
Dividing by normalizes the step size relative to the current search radius.
The charts above show how each sample contributes to the and updates. All arrows have the same length — color encodes the magnitude of influence.
Left ( contributions): Arrow direction shows where each sample pulls . Arrow color (plasma colormap) shows influence strength: yellow = strong influence, purple = weak. The orange arrow shows the net update direction.
Right ( contributions): Arrows point outward (expand ) or inward (contract ). Color encodes both direction and strength: dark red = strong expand, dark blue = strong contract, lighter colors = weaker influence. The dashed circle shows the current 1- radius.
As you scrub through the iterations, watch for these patterns:
The plot on the right tells the story: descending curves mean exploitation, rising curves mean the algorithm is searching for better regions.
is learned exploration/exploitation balance: Unlike fixed cooling schedules in simulated annealing, ES learns when to explore vs exploit based on feedback.
The adaptation is local and reactive: responds to what worked recently, not a predetermined schedule.
Starting matters: Too small → trapped in local optima. Too large → slow convergence. But adaptive can recover from poor initialization.
Different landscapes, different dynamics:
This notebook showed isotropic ES (same in all directions). Real-world problems often benefit from direction-dependent search:
The core insight remains: let the search space adapt to the problem.