Vectors, Dot Products, and the Mathematics Behind Machine Learning | Chapter 2 of Why Machines Learn

Vectors, Dot Products, and the Mathematics Behind Machine Learning | Chapter 2 of Why Machines Learn

Chapter 2, “We Are All Just Numbers Here…,” from Why Machines Learn: The Elegant Math Behind Modern AI dives into the mathematical foundations that make learning algorithms possible. Moving from 19th-century discoveries in vector algebra to the modern perceptron, the chapter explains why linear algebra is the language of machine learning. This post expands on the video’s core ideas and provides an accessible walkthrough of the geometry, notation, and logic that help machines interpret the world as numbers.

For a deeper guided explanation, be sure to watch the chapter summary above. Supporting the Last Minute Lecture channel helps us keep producing accessible academic breakdowns for complex textbooks.

Book cover

From Quaternions to Vectors: The Birth of Modern AI Mathematics

Anil Ananthaswamy begins by tracing the story of William Rowan Hamilton, whose work on quaternions introduced concepts—scalars, vectors, unit vectors—that later became essential to physics and eventually to machine learning. While quaternions themselves are not central to perceptrons, their decomposition into directional components helped formalize vector spaces, which are now the backbone of data representation in AI.

This early history shows how abstract mathematical discoveries often predate and enable breakthroughs in entirely different domains—such as machine learning more than a century later.

Scalars, Vectors, and Coordinate Systems

The chapter introduces the components that structure machine learning data:

  • Scalars: single numerical values that scale vectors
  • Vectors: ordered lists of numbers representing position, attributes, or features
  • Unit vectors (i and j): standardized directions in two-dimensional space

These components allow data to be mapped into geometric space, where learning algorithms can measure distances, angles, and relative positions. Once framed geometrically, classification becomes a problem of drawing boundaries.

Vector Addition, Subtraction, and Scalar Multiplication

Ananthaswamy uses intuitive examples—like Newtonian forces—to explain how vectors combine and how their lengths and directions change. These same operations help AI systems:

  • Represent multiple features of a dataset in coordinate form
  • Compute similarity between examples
  • Model how weight updates move a decision boundary

Machine learning relies heavily on these computations because every training example modifies the model in vector space.

The Dot Product and Its Geometric Meaning

One of the most important tools introduced in this chapter is the dot product—a calculation that reveals how aligned two vectors are. Its geometric interpretation allows perceptrons to determine which side of a hyperplane a data point lies on.

For a perceptron, the dot product between the weight vector and an input vector acts as the engine of prediction. If the result exceeds a threshold, the perceptron classifies the point one way; if not, it classifies it the other way.

Hyperplanes, Linear Separability, and Geometric Classification

By combining vectors with the dot product, Ananthaswamy introduces a central idea: hyperplanes, or generalized lines in higher dimensions. These hyperplanes divide space into regions that represent different classes.

The perceptron succeeds only when the dataset is linearly separable—meaning such a hyperplane exists. This is why the geometric framework is indispensable: learning is literally the act of adjusting a boundary in vector space.

Matrix Notation and the Perceptron Learning Algorithm

Matrix notation allows many vectors to be represented and manipulated at once, simplifying the mathematical description of learning. With matrices, the perceptron learning rule can be compactly expressed and efficiently computed.

The perceptron updates its weights based on errors, pushing the decision boundary closer to the correct separation. Each training step is a small geometric movement guided by vector math.

The Perceptron Convergence Proof

One of the most mathematically elegant parts of the chapter is the convergence proof, which ensures that the perceptron will always find a separating hyperplane when one exists. This proof depends on:

  • Dot product geometry
  • Margins between classes
  • Rewriting updates in matrix form

The guarantee of convergence was a landmark achievement in early AI theory and remains a foundational theorem in machine learning.

Where the Perceptron Fails: The XOR Problem

The chapter ends by addressing the perceptron’s limitations. Because XOR cannot be separated by a single line, the perceptron fails completely. This discovery, emphasized in Minsky and Papert’s critique, contributed to the first “AI winter,” slowing research for years.

Yet this limitation also sparked innovation. The inability to solve XOR hinted that multiple layers were needed—a concept that would eventually lead to backpropagation and deep learning.

Conclusion: Why All of This Math Matters

Chapter 2 reveals that modern AI depends on centuries of mathematical development. From vectors and dot products to hyperplanes and convergence proofs, these concepts explain how learning algorithms make decisions and why they behave the way they do.

To see these ideas in action, be sure to watch the full video summary above and explore the complete chapter playlist. Supporting Last Minute Lecture ensures we can keep producing clear, accessible study resources for learners everywhere.

If you found this breakdown helpful, be sure to subscribe to Last Minute Lecture for more chapter-by-chapter textbook summaries and academic study guides.

Click here to view the full YouTube playlist for Why Machines Learn

Comments

Popular posts from this blog

Writing an APA-Style Research Report — Structure, Formatting, and Proposals | Chapter 16 of Research Methods for the Behavioral Sciences

Violence, Mourning, and the Ethics of Vulnerability — Rethinking Grievability and State Power | Chapter 2 of Precarious Life by Judith Butler

The Descriptive Research Strategy — Observation, Surveys, and Case Studies Explained | Chapter 13 of Research Methods for the Behavioral Sciences