Linear Algebra for Machine Learning

Linear algebra is the workhorse of machine learning. Nearly every machine learning algorithm — from linear regression to deep neural networks — relies on vectors, matrices, and the operations between them.

This notebook covers the essentials, supplementing lecture 3 of the :

TopicWhat you'll learn
VectorsCreation, addition, subtraction, scalar multiplication
MatricesArithmetic, transpose, multiplication (three views)
Linear mapsHow matrices transform space, Gram matrices
Interactive explorerDrag a 2x2 matrix and watch geometry change

All computations use PyTorch tensors, the same objects you'll use to build neural networks.

Vectors

A vector is an ordered list of numbers. In machine learning we usually think of vectors as column vectors:

v=[v1v2vn]Rn\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n

Two views:

  • Algebraic: a vector is a tuple of coordinates.
  • Geometric: a vector is an arrow from the origin to a point in space.

In PyTorch, vectors are 1-D tensors.

["text/plain:tensor([2., 1.])", "text/plain:tensor([1., 3.])"]

Vector addition & subtraction

v+w=[v1+w1v2+w2],vw=[v1w1v2w2]\mathbf{v} + \mathbf{w} = \begin{bmatrix} v_1 + w_1 \\ v_2 + w_2 \end{bmatrix}, \qquad \mathbf{v} - \mathbf{w} = \begin{bmatrix} v_1 - w_1 \\ v_2 - w_2 \end{bmatrix}

Geometrically, v+w\mathbf{v} + \mathbf{w} is the diagonal of the parallelogram formed by v\mathbf{v} and w\mathbf{w}.

Cell output

Scalar multiplication

cv=[cv1cv2]c \cdot \mathbf{v} = \begin{bmatrix} c \, v_1 \\ c \, v_2 \end{bmatrix}
  • c>1|c| > 1: stretches the vector
  • c<1|c| < 1: shrinks the vector
  • c<0c < 0: reverses direction
Cell output

Matrices

A matrix is a rectangular array of numbers — a 2-D tensor in PyTorch:

A=[a11a12a21a22]Rm×nA = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \in \mathbb{R}^{m \times n}
["text/plain:tensor([[1., 2.],\n        [3., 4.]])", "text/plain:tensor([[5., 6.],\n        [7., 8.]])"]

Matrix arithmetic

OperationFormulaPyTorch
AdditionA+BA + BA + B
Scalar multiplicationcAcAc * A
TransposeAA^\topA.T
["text/plain:tensor([[ 6.,  8.],\n        [10., 12.]])", "text/plain:tensor([[ 3.,  6.],\n        [ 9., 12.]])", "text/plain:tensor([[1., 3.],\n        [2., 4.]])"]

Dot product & outer product

The dot product (inner product) of two vectors v,wRn\mathbf{v}, \mathbf{w} \in \mathbb{R}^n:

vw=i=1nviwiR\mathbf{v}^\top \mathbf{w} = \sum_{i=1}^{n} v_i \, w_i \in \mathbb{R}

The outer product produces a matrix:

vw=[v1w1v1w2v2w1v2w2]Rm×n\mathbf{v} \, \mathbf{w}^\top = \begin{bmatrix} v_1 w_1 & v_1 w_2 \\ v_2 w_1 & v_2 w_2 \end{bmatrix} \in \mathbb{R}^{m \times n}
["text/plain:5.0", "text/plain:tensor([[2., 6.],\n        [1., 3.]])"]

Matrix multiplication

For ARm×nA \in \mathbb{R}^{m \times n} and BRn×pB \in \mathbb{R}^{n \times p}, the product C=ABRm×pC = AB \in \mathbb{R}^{m \times p}.

Three equivalent views:

  1. Entry-wise:   Cij=k=1nAikBkj\;C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} — each entry is a dot product of a row of AA with a column of BB.

  2. Column view: Each column of CC is a linear combination of columns of AA, with coefficients from the corresponding column of BB.

  3. Row view: Each row of CC is a linear combination of rows of BB, with coefficients from the corresponding row of AA.

In PyTorch, the @ operator is used for matrix multiplication:

tensor([[19., 22.], [43., 50.]])

Matrix as a linear map

Multiplying a matrix by a vector applies a linear transformation:

y=Ax=x1[a11a21]+x2[a12a22]\mathbf{y} = A\mathbf{x} = x_1 \begin{bmatrix} a_{11} \\ a_{21} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \end{bmatrix}

The output is a linear combination of the columns of AA, weighted by the entries of x\mathbf{x}. This means:

  • The columns of AA are where the standard basis vectors e1,e2\mathbf{e}_1, \mathbf{e}_2 get mapped.
  • Lines through the origin stay lines (linear maps preserve linearity).

Gram matrix

The Gram matrix G=AAG = A^\top A captures the inner products between columns of AA. It is always symmetric (G=GG = G^\top) and positive semi-definite.

The Gram matrix appears in PCA, kernel methods, and the normal equations for least squares.

["text/plain:tensor([[10., 14.],\n        [14., 20.]])", "text/plain:tensor([[True, True],\n        [True, True]])"]

Interactive: explore 2D transformations

Drag the matrix entries below to see how different matrices transform the plane. Try these classic transformations:

TransformationMatrix
Identity[1001]\begin{bmatrix}1&0\\0&1\end{bmatrix}
Scaling[sx00sy]\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}
Rotation by θ\theta[cosθsinθsinθcosθ]\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}
Reflection (y-axis)[1001]\begin{bmatrix}-1&0\\0&1\end{bmatrix}
Shear[1k01]\begin{bmatrix}1&k\\0&1\end{bmatrix}

Cell output

Geometric formulas

Rotation by angle θ\theta (counter-clockwise):

R(θ)=[cosθsinθsinθcosθ]R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}

Reflection across the xx-axis: [1001]\begin{bmatrix}1&0\\0&-1\end{bmatrix}, across the yy-axis: [1001]\begin{bmatrix}-1&0\\0&1\end{bmatrix}.

Dilation (uniform scaling): [c00c]\begin{bmatrix}c&0\\0&c\end{bmatrix} stretches (c>1c>1) or shrinks (0<c<10<c<1) all directions equally.

Cell output

Summary

ConceptMathPyTorch
Vector additionv+w\mathbf{v} + \mathbf{w}v + w
Scalar multiplicationcvc\mathbf{v}c * v
Dot productvw\mathbf{v}^\top\mathbf{w}torch.dot(v, w)
Outer productvw\mathbf{v}\mathbf{w}^\toptorch.outer(v, w)
Matrix multiplyABABA @ B
TransposeAA^\topA.T
Gram matrixAAA^\top AA.T @ A

Key takeaway: Every matrix encodes a linear map. Understanding how matrices act on vectors — stretching, rotating, reflecting — gives you geometric intuition for the transformations at the heart of machine learning.