Gradient Descent

Gradient descent is the optimization algorithm used to update the weights and biases of a neural network to minimize the loss. It moves the parameters in the opposite direction of the gradient.

Weight update: w ← w – learning_rate * ∇w

Three Variants

Batch Gradient Descent: Uses entire dataset to compute gradient. Accurate but slow and memory‑intensive.
Stochastic Gradient Descent (SGD): Uses one random sample per update. Fast but noisy.
Mini‑Batch Gradient Descent: Uses a small batch (e.g., 32 or 64). Best of both worlds – most common in deep learning.

Learning Rate

The learning rate controls step size. Too high: overshoot, diverge. Too low: slow convergence. Typical starting values: 0.001, 0.01, 0.1.

Challenges with Standard GD

Getting stuck in local minima or saddle points.
Sensitive to learning rate choice.
Same learning rate for all parameters.

Advanced optimizers (Adam, RMSProp, etc.) address these – we will cover them in the next module.

Epochs and Iterations

One epoch = one pass through the entire training dataset. In mini‑batch GD, one iteration = one batch update. For example, 1000 samples, batch size 32 → 32 iterations per epoch.

Two Minute Drill

Gradient descent minimizes loss by moving opposite to gradient.
Batch GD uses all data; SGD uses one sample; mini‑batch uses a batch.
Learning rate controls step size.
Mini‑batch GD is the standard in deep learning.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Gradient Descent

Need more clarification?