Activation Functions
Activation functions introduce non‑linearity into neural networks. Without them, stacking multiple layers would be equivalent to a single linear layer – useless for complex problems.
Sigmoid
Sigmoid squashes output between 0 and 1. Historically popular, but suffers from vanishing gradient and outputs not zero‑centered.
σ(x) = 1 / (1 + e^(-x))Use: output layer for binary classification.Tanh
Tanh squashes between -1 and 1, zero‑centered, which helps optimization. Still suffers from vanishing gradient.
tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))ReLU (Rectified Linear Unit)
ReLU is the most popular activation for hidden layers. It is cheap to compute and mitigates vanishing gradient. Negative inputs become zero (dead neurons can be an issue).
ReLU(x) = max(0, x)Leaky ReLU / PReLU
Allows a small positive slope for negative values, preventing dead neurons.
LeakyReLU(x) = max(αx, x) where α is small (e.g., 0.01)Softmax
Softmax converts logits into probabilities that sum to 1. Used in the output layer for multi‑class classification.
softmax(z_i) = e^(z_i) / Σ e^(z_j)Choosing Activation Functions
- Hidden layers: ReLU or Leaky ReLU (default).
- Binary classification output: Sigmoid.
- Multi‑class classification output: Softmax.
- Regression output: Linear (no activation).
Two Minute Drill
- Activation functions add non‑linearity.
- Sigmoid (0 to 1) – vanishing gradient.
- Tanh (-1 to 1) – zero‑centered.
- ReLU (max(0,x)) – default for hidden layers.
- Softmax – probability distribution for multi‑class.
Need more clarification?
Drop us an email at career@quipoinfotech.com
