Gated Recurrent Unit (GRU)

Gated Recurrent Unit (GRU) is a simpler alternative to LSTM. It combines the forget and input gates into a single update gate and merges the cell state with the hidden state. GRUs have fewer parameters and often perform similarly to LSTMs.

GRU = LSTM with fewer gates, no separate cell state.

Two Gates in GRU

Reset gate: decides how much past information to forget.
Update gate: decides how much of the new candidate to keep (combines LSTM's forget and input).

z_t = σ(W_z·[h_{t-1}, x_t])   # update gate
r_t = σ(W_r·[h_{t-1}, x_t])   # reset gate

Candidate and Hidden State

The candidate hidden state uses the reset gate to ignore past information. The final hidden state is a weighted average of previous state and candidate.

h̃_t = tanh(W_h·[r_t * h_{t-1}, x_t])
h_t = (1 - z_t) * h_{t-1} + z_t * h̃_t

GRU vs LSTM

GRU has fewer parameters → faster to train, requires less data.
LSTM has separate cell state → more expressive for very long sequences.
In practice, both perform similarly on many tasks. Start with GRU for smaller datasets, LSTM for larger.

Usage

# PyTorch
self.gru = nn.GRU(input_size, hidden_size, num_layers)
output, hidden = self.gru(x, h0)

Two Minute Drill

GRU has reset and update gates, no separate cell state.
Fewer parameters than LSTM → faster.
Often similar performance to LSTM.
Good choice for smaller datasets.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Gated Recurrent Unit (GRU)

Need more clarification?