Machine Learning Algorithms — K-Means, Weighted Experts & Gradient Descent | Chapter 33 in Introduction to Algorithms

June 26, 2025

Machine Learning Algorithms — K-Means, Weighted Experts & Gradient Descent | Chapter 33 in Introduction to Algorithms

Chapter 33 of Introduction to Algorithms introduces three foundational machine learning techniques: k-means clustering for unsupervised learning, multiplicative-weights algorithms for online decision-making, and gradient descent for optimization in large-scale data settings. These algorithms are essential tools across supervised, unsupervised, and online learning paradigms, and they serve as building blocks in modern data science and artificial intelligence.

Watch the video for a full breakdown of how these algorithms work and where they apply. Subscribe to Last Minute Lecture for more data-driven summaries from classic textbooks.

Types of Machine Learning

The chapter begins by categorizing learning approaches:

Supervised Learning: Learn from labeled data to predict outcomes (e.g., spam classification).
Unsupervised Learning: Discover hidden structure in unlabeled data (e.g., customer segmentation).
Reinforcement Learning: Learn via feedback from interactions (briefly mentioned).

K-Means Clustering (Unsupervised Learning)

K-means partitions a dataset into k clusters by minimizing intra-cluster dissimilarity. It works with high-dimensional feature vectors and relies on squared Euclidean distance to measure similarity. The most common implementation is Lloyd’s Algorithm:

Randomly initialize k cluster centers
Assign each point to the nearest center
Recompute centers as the centroid of each cluster
Repeat until convergence

This heuristic may converge to a local minimum, so multiple runs with different seeds are advised. Runtime is O(T × d × k × n), where T is the number of iterations. K-means is widely used in customer segmentation, image compression, and document clustering.

Multiplicative-Weights Algorithms (Online Learning)

In the expert advice model, an algorithm receives predictions from a set of experts and must choose whom to follow. The Weighted-Majority Algorithm assigns weights to each expert and penalizes incorrect ones by reducing their weight after each round:

All experts begin with equal weight
Predict using weighted majority vote
Scale down weights of incorrect experts by (1 – ε)

This algorithm minimizes regret—the difference between the algorithm's mistakes and those of the best expert. A randomized version selects an expert probabilistically and guarantees expected regret within a tight bound. These methods are powerful in settings with repeated decision-making, such as portfolio selection and recommendation systems.

Gradient Descent (Optimization Technique)

Gradient descent is a cornerstone of modern machine learning, used to minimize differentiable functions. It updates model parameters based on the gradient of the loss function:

x⁽ᵗ⁺¹⁾ = x⁽ᵗ⁾ – η × ∇f(x⁽ᵗ⁾)

Where η is the step size or learning rate. For convex functions, gradient descent guarantees convergence to a global minimum. Chapter 33 presents:

Theorem 33.8: Choosing η = R / (L√T) ensures the error after T steps is ≤ (R × L) / √T
Projected gradient descent: Handles constrained problems by projecting updates back into the feasible set
Stochastic Gradient Descent (SGD): Uses mini-batches or single examples for efficient training on large datasets

Gradient descent is the backbone of linear regression, neural network training, and convex optimization tasks across machine learning and data science.

Use Cases and Summary

Each of the three algorithms solves a key challenge in machine learning:

K-means: Unsupervised pattern discovery and clustering
Multiplicative weights: Online adaptation in adversarial environments
Gradient descent: Optimizing objective functions in both supervised and unsupervised models

Mastering these techniques opens the door to solving real-world problems involving data classification, pattern discovery, recommendation systems, and model optimization.

Conclusion

Chapter 33 bridges theory and practical application by showing how core algorithmic tools drive machine learning workflows. Whether you're clustering documents with k-means, aggregating predictions with multiplicative weights, or training models with gradient descent, these strategies are essential for developing intelligent, data-driven systems.

🎓 For a deeper look into how these machine learning algorithms operate, watch the full podcast-style summary above and browse more chapters from Last Minute Lecture.

If you found this breakdown helpful, be sure to subscribe to Last Minute Lecture for more chapter-by-chapter textbook summaries and academic study guides.

Search This Blog

Last Minute Lecture