Loading

Quipoin Menu

Learn • Practice • Grow

deep-learning / Gradient Descent
tutorial

Gradient Descent

Gradient descent is the optimization algorithm used to update the weights and biases of a neural network to minimize the loss. It moves the parameters in the opposite direction of the gradient.

Weight update: w ← w – learning_rate * ∇w

Three Variants

  • Batch Gradient Descent: Uses entire dataset to compute gradient. Accurate but slow and memory‑intensive.
  • Stochastic Gradient Descent (SGD): Uses one random sample per update. Fast but noisy.
  • Mini‑Batch Gradient Descent: Uses a small batch (e.g., 32 or 64). Best of both worlds – most common in deep learning.

Learning Rate

The learning rate controls step size. Too high: overshoot, diverge. Too low: slow convergence. Typical starting values: 0.001, 0.01, 0.1.

Challenges with Standard GD

  • Getting stuck in local minima or saddle points.
  • Sensitive to learning rate choice.
  • Same learning rate for all parameters.
Advanced optimizers (Adam, RMSProp, etc.) address these – we will cover them in the next module.

Epochs and Iterations

One epoch = one pass through the entire training dataset. In mini‑batch GD, one iteration = one batch update. For example, 1000 samples, batch size 32 → 32 iterations per epoch.


Two Minute Drill
  • Gradient descent minimizes loss by moving opposite to gradient.
  • Batch GD uses all data; SGD uses one sample; mini‑batch uses a batch.
  • Learning rate controls step size.
  • Mini‑batch GD is the standard in deep learning.

Need more clarification?

Drop us an email at career@quipoinfotech.com