Activation Functions

Activation functions introduce non‑linearity into neural networks. Without them, stacking multiple layers would be equivalent to a single linear layer – useless for complex problems.

Sigmoid

Sigmoid squashes output between 0 and 1. Historically popular, but suffers from vanishing gradient and outputs not zero‑centered.

σ(x) = 1 / (1 + e^(-x))

Use: output layer for binary classification.

Tanh

Tanh squashes between -1 and 1, zero‑centered, which helps optimization. Still suffers from vanishing gradient.

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

ReLU (Rectified Linear Unit)

ReLU is the most popular activation for hidden layers. It is cheap to compute and mitigates vanishing gradient. Negative inputs become zero (dead neurons can be an issue).

ReLU(x) = max(0, x)

Leaky ReLU / PReLU

Allows a small positive slope for negative values, preventing dead neurons.

LeakyReLU(x) = max(αx, x) where α is small (e.g., 0.01)

Softmax

Softmax converts logits into probabilities that sum to 1. Used in the output layer for multi‑class classification.

softmax(z_i) = e^(z_i) / Σ e^(z_j)

Choosing Activation Functions

Hidden layers: ReLU or Leaky ReLU (default).
Binary classification output: Sigmoid.
Multi‑class classification output: Softmax.
Regression output: Linear (no activation).

Two Minute Drill

Activation functions add non‑linearity.
Sigmoid (0 to 1) – vanishing gradient.
Tanh (-1 to 1) – zero‑centered.
ReLU (max(0,x)) – default for hidden layers.
Softmax – probability distribution for multi‑class.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Activation Functions

Need more clarification?