Machine Learning Algorithms — K-Means, Weighted Experts & Gradient Descent | Chapter 33 in Introduction to Algorithms
Machine Learning Algorithms — K-Means, Weighted Experts & Gradient Descent | Chapter 33 in Introduction to Algorithms
Chapter 33 of Introduction to Algorithms introduces three foundational machine learning techniques: k-means clustering for unsupervised learning, multiplicative-weights algorithms for online decision-making, and gradient descent for optimization in large-scale data settings. These algorithms are essential tools across supervised, unsupervised, and online learning paradigms, and they serve as building blocks in modern data science and artificial intelligence.
Watch the video for a full breakdown of how these algorithms work and where they apply. Subscribe to Last Minute Lecture for more data-driven summaries from classic textbooks.

Types of Machine Learning
The chapter begins by categorizing learning approaches:
- Supervised Learning: Learn from labeled data to predict outcomes (e.g., spam classification).
- Unsupervised Learning: Discover hidden structure in unlabeled data (e.g., customer segmentation).
- Reinforcement Learning: Learn via feedback from interactions (briefly mentioned).
K-Means Clustering (Unsupervised Learning)
K-means partitions a dataset into k clusters by minimizing intra-cluster dissimilarity. It works with high-dimensional feature vectors and relies on squared Euclidean distance to measure similarity. The most common implementation is Lloyd’s Algorithm:
- Randomly initialize k cluster centers
- Assign each point to the nearest center
- Recompute centers as the centroid of each cluster
- Repeat until convergence
This heuristic may converge to a local minimum, so multiple runs with different seeds are advised. Runtime is O(T × d × k × n), where T is the number of iterations. K-means is widely used in customer segmentation, image compression, and document clustering.
Multiplicative-Weights Algorithms (Online Learning)
In the expert advice model, an algorithm receives predictions from a set of experts and must choose whom to follow. The Weighted-Majority Algorithm assigns weights to each expert and penalizes incorrect ones by reducing their weight after each round:
- All experts begin with equal weight
- Predict using weighted majority vote
- Scale down weights of incorrect experts by (1 – ε)
This algorithm minimizes regret—the difference between the algorithm's mistakes and those of the best expert. A randomized version selects an expert probabilistically and guarantees expected regret within a tight bound. These methods are powerful in settings with repeated decision-making, such as portfolio selection and recommendation systems.
Gradient Descent (Optimization Technique)
Gradient descent is a cornerstone of modern machine learning, used to minimize differentiable functions. It updates model parameters based on the gradient of the loss function:
x⁽ᵗ⁺¹⁾ = x⁽ᵗ⁾ – η × ∇f(x⁽ᵗ⁾)
Where η is the step size or learning rate. For convex functions, gradient descent guarantees convergence to a global minimum. Chapter 33 presents:
- Theorem 33.8: Choosing η = R / (L√T) ensures the error after T steps is ≤ (R × L) / √T
- Projected gradient descent: Handles constrained problems by projecting updates back into the feasible set
- Stochastic Gradient Descent (SGD): Uses mini-batches or single examples for efficient training on large datasets
Gradient descent is the backbone of linear regression, neural network training, and convex optimization tasks across machine learning and data science.
Use Cases and Summary
Each of the three algorithms solves a key challenge in machine learning:
- K-means: Unsupervised pattern discovery and clustering
- Multiplicative weights: Online adaptation in adversarial environments
- Gradient descent: Optimizing objective functions in both supervised and unsupervised models
Mastering these techniques opens the door to solving real-world problems involving data classification, pattern discovery, recommendation systems, and model optimization.
Conclusion
Chapter 33 bridges theory and practical application by showing how core algorithmic tools drive machine learning workflows. Whether you're clustering documents with k-means, aggregating predictions with multiplicative weights, or training models with gradient descent, these strategies are essential for developing intelligent, data-driven systems.
🎓 For a deeper look into how these machine learning algorithms operate, watch the full podcast-style summary above and browse more chapters from Last Minute Lecture.
If you found this breakdown helpful, be sure to subscribe to Last Minute Lecture for more chapter-by-chapter textbook summaries and academic study guides.
Comments
Post a Comment