Supervised learning: 🐱 or 🐶?

This notebook demonstrates supervised learning with a simple binary classifier trained on real cat and dog photos. We download a public dataset, resize the images, extract pixel features, and train a logistic regression model with PyTorch to tell cats from dogs.

Download the dataset

Dataset ready at dog-cat-full-dataset/data

Load and preprocess images

Loaded 10000 training images and 1000 test images (resized to 64x64 RGB).

Sample images from the dataset

Cat
[Image blocked: No description]
Cat
[Image blocked: No description]
Cat
[Image blocked: No description]
Cat
[Image blocked: No description]
Dog
[Image blocked: No description]
Dog
[Image blocked: No description]
Dog
[Image blocked: No description]
Dog
[Image blocked: No description]

Train the classifier

The model minimizes the binary cross-entropy loss over the training set:

L=1Ni=1N[yilog(p^i)+(1yi)log(1p^i)]\mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \left[ y_i \log(\hat{p}_i) + (1 - y_i) \log(1 - \hat{p}_i) \right]

SymbolMeaningIn this task
NNNumber of training samples10,000 images (5,000 cats + 5,000 dogs)
yiy_iGround-truth label for sample ii0 = Cat, 1 = Dog
xi\mathbf{x}_iFeature vector for sample ii12,288-dim vector (64 × 64 × 3 RGB pixels)
p^i=σ(wxi+b)\hat{p}_i = \sigma(\mathbf{w}^\top \mathbf{x}_i + b)Predicted probability that sample ii is a dogModel's confidence the photo is a dog

When the true label is dog (yi=1y_i = 1), only the log(p^i)\log(\hat{p}_i) term is active, penalizing low confidence. When the true label is cat (yi=0y_i = 0), only the log(1p^i)\log(1 - \hat{p}_i) term is active, penalizing high confidence. SGD adjusts w\mathbf{w} and bb to minimize L\mathcal{L}.

MetricValue
Training samples10000
Test samples1000
Features per image12288
Epochs100
Train accuracy67.6%
Test accuracy61.1%

Interactive predictions

True label: Cat | Predicted: Cat | Correct

How it works

  • Real cat and dog photos are downloaded from a public GitHub dataset.
  • Each image is resized to 64x64 RGB pixels and flattened into a 12,288-dimensional feature vector (64 x 64 x 3 channels).
  • Logistic regression is implemented as a single nn.Linear layer (12,288 -> 2) trained with SGD and cross-entropy loss in PyTorch.
  • torch.softmax converts the raw logits into a probability distribution over the two classes.
  • Click Sample random image to cycle through examples and inspect the model's confidence on each prediction.
  • Because this is a linear model on raw pixels, accuracy is limited. More powerful approaches (CNNs, transfer learning) would perform significantly better on this task.