02_supervised_learning.py

https://github.com/marimo-team/modernaicourse/blob/main/concepts/02_supervised_learning.py

Supervised learning: 🐱 or 🐶?

This notebook demonstrates supervised learning with a simple binary classifier trained on real cat and dog photos. We download a public dataset, resize the images, extract pixel features, and train a logistic regression model with PyTorch to tell cats from dogs.

Download the dataset

Dataset ready at dog-cat-full-dataset/data

Load and preprocess images

Loaded 10000 training images and 1000 test images (resized to 64x64 RGB).

Sample images from the dataset

Cat
[Image blocked: No description]Cat
[Image blocked: No description]Cat
[Image blocked: No description]Cat
[Image blocked: No description]Dog
[Image blocked: No description]Dog
[Image blocked: No description]Dog
[Image blocked: No description]Dog
[Image blocked: No description]

Train the classifier

The model minimizes the binary cross-entropy loss over the training set:

$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \left[ y_i \log(\hat{p}_i) + (1 - y_i) \log(1 - \hat{p}_i) \right]$

Symbol	Meaning	In this task
$N$	Number of training samples	10,000 images (5,000 cats + 5,000 dogs)
$y_i$	Ground-truth label for sample $i$	0 = Cat, 1 = Dog
$\mathbf{x}_i$	Feature vector for sample $i$	12,288-dim vector (64 × 64 × 3 RGB pixels)
$\hat{p}_i = \sigma(\mathbf{w}^\top \mathbf{x}_i + b)$	Predicted probability that sample $i$ is a dog	Model's confidence the photo is a dog

When the true label is dog ( $y_i = 1$ ), only the $\log(\hat{p}_i)$ term is active, penalizing low confidence. When the true label is cat ( $y_i = 0$ ), only the $\log(1 - \hat{p}_i)$ term is active, penalizing high confidence. SGD adjusts $\mathbf{w}$ and $b$ to minimize $\mathcal{L}$ .

Metric	Value
Training samples	10000
Test samples	1000
Features per image	12288
Epochs	100
Train accuracy	67.6%
Test accuracy	61.1%

Interactive predictions

True label: Cat | Predicted: Cat | Correct

How it works

Real cat and dog photos are downloaded from a public GitHub dataset.
Each image is resized to 64x64 RGB pixels and flattened into a 12,288-dimensional feature vector (64 x 64 x 3 channels).
Logistic regression is implemented as a single nn.Linear layer (12,288 -> 2) trained with SGD and cross-entropy loss in PyTorch.
torch.softmax converts the raw logits into a probability distribution over the two classes.
Click Sample random image to cycle through examples and inspect the model's confidence on each prediction.
Because this is a linear model on raw pixels, accuracy is limited. More powerful approaches (CNNs, transfer learning) would perform significantly better on this task.