Gradient Descent

How does a model learn the best weights (w) and bias (b)? Gradient descent is the optimization algorithm that minimizes the error by iteratively moving towards the steepest downhill direction.

Gradient descent updates parameters in the opposite direction of the gradient to reduce the cost function.

The Intuition: Walking Downhill in Fog

Imagine you are on a mountain in thick fog and want to reach the bottom. You can’t see the whole path, but you can feel the slope under your feet. You take a small step downhill, feel again, repeat. That is gradient descent – each step reduces the error.

The Learning Rate

The learning rate controls how big each step is.

Too large: may overshoot and never reach minimum.
Too small: takes forever to converge.
Just right: steady decrease in error.

Typical values: 0.1, 0.01, 0.001.

Types of Gradient Descent

Batch GD: Uses all data to compute gradient – accurate but slow.
Stochastic GD (SGD): Uses one random sample – fast but noisy.
Mini‑batch GD: Uses a small batch – best compromise (most common).

You Don’t Need to Code It

Scikit‑learn handles gradient descent for you. Understanding the concept helps you tune learning rate and diagnose convergence issues.

Two Minute Drill

Gradient descent minimizes error by stepping downhill.
Learning rate controls step size.
Types: batch, stochastic, mini‑batch.
Core optimization algorithm for most ML models.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Gradient Descent

Need more clarification?