Gated Recurrent Unit (GRU)
Gated Recurrent Unit (GRU) is a simpler alternative to LSTM. It combines the forget and input gates into a single update gate and merges the cell state with the hidden state. GRUs have fewer parameters and often perform similarly to LSTMs.
GRU = LSTM with fewer gates, no separate cell state.
Two Gates in GRU
- Reset gate: decides how much past information to forget.
- Update gate: decides how much of the new candidate to keep (combines LSTM's forget and input).
z_t = σ(W_z·[h_{t-1}, x_t]) # update gate
r_t = σ(W_r·[h_{t-1}, x_t]) # reset gateCandidate and Hidden State
The candidate hidden state uses the reset gate to ignore past information. The final hidden state is a weighted average of previous state and candidate.
h̃_t = tanh(W_h·[r_t * h_{t-1}, x_t])
h_t = (1 - z_t) * h_{t-1} + z_t * h̃_tGRU vs LSTM
- GRU has fewer parameters → faster to train, requires less data.
- LSTM has separate cell state → more expressive for very long sequences.
- In practice, both perform similarly on many tasks. Start with GRU for smaller datasets, LSTM for larger.
Usage
# PyTorch
self.gru = nn.GRU(input_size, hidden_size, num_layers)
output, hidden = self.gru(x, h0)Two Minute Drill
- GRU has reset and update gates, no separate cell state.
- Fewer parameters than LSTM → faster.
- Often similar performance to LSTM.
- Good choice for smaller datasets.
Need more clarification?
Drop us an email at career@quipoinfotech.com
