Loading

Quipoin Menu

Learn • Practice • Grow

math-for-ai / Learning Rate and Convergence
tutorial

Learning Rate and Convergence

The learning rate is a small number that controls how big a step we take in gradient descent. It is one of the most important settings in training AI models. Too large, and you might overshoot the minimum. Too small, and training takes forever.

The learning rate determines the step size in gradient descent. Convergence is when the cost stops decreasing significantly.

Learning Rate Effects

  • Too high: The model jumps around, may diverge (cost goes up) or oscillate without reaching minimum.
  • Too low: Progress is very slow; may get stuck in a local minimum or take ages to converge.
  • Just right: Steady decrease in cost until it flattens out (convergence).

Analogy: Descending a Mountain

Imagine you are hiking down a mountain in fog. If you take huge leaps, you might miss the valley and fall off a cliff. If you take tiny steps, you’ll take forever to get down. The learning rate is your stride length.

Learning Rate Scheduling

Often, we start with a larger learning rate to make progress, then reduce it over time to fine‑tune. This is called learning rate decay or scheduling. Common schedules: step decay, exponential decay, or adaptive methods like Adam.

What Is Convergence?

Convergence means that the cost function has stopped decreasing (or is decreasing very slowly) and the model parameters are stable. We say the training has converged. You can monitor the loss curve: after many iterations, it should flatten.

How to Choose a Learning Rate

  • Try a range: 0.1, 0.01, 0.001, 0.0001.
  • Observe the loss: if it diverges (explodes), reduce LR. If loss decreases too slowly, increase LR slightly.
  • Use learning rate finders (a technique that tests many LRs quickly).
  • For deep learning, start with 0.001 (common default).

Example: Simple Gradient Descent

Suppose we want to minimize f(x)=x². Gradient = 2x. Starting at x=10, LR=0.1: x ← 10 – 0.1×20 = 8. Next: 8 – 0.1×16 = 6.4. Converges to 0. With LR=1.1: x ← 10 – 1.1×20 = –12 (oscillates).


Two Minute Drill
  • Learning rate controls step size in gradient descent.
  • Too high: diverges; too low: slow convergence.
  • Convergence = cost stops decreasing.
  • Learning rate scheduling improves training.

Need more clarification?

Drop us an email at career@quipoinfotech.com