Long Short-Term Memory (LSTM)
Long Short‑Term Memory (LSTM) is a type of RNN designed to overcome the vanishing gradient problem. It uses a gating mechanism to decide what to forget, what to store, and what to output. LSTMs can learn dependencies over hundreds of steps.
LSTM introduces a cell state that acts as a conveyor belt, with gates controlling information flow.
The Cell State (C_t)
The cell state runs through the entire chain with only minor linear interactions, allowing information to flow easily without vanishing. Gates add or remove information from the cell state.
Three Gates
- Forget gate: decides what to discard from the previous cell state.
- Input gate: decides what new information to store in the cell state.
- Output gate: decides what to output based on the cell state.
f_t = σ(W_f·[h_{t-1}, x_t] + b_f) # forget gate
i_t = σ(W_i·[h_{t-1}, x_t] + b_i) # input gate
o_t = σ(W_o·[h_{t-1}, x_t] + b_o) # output gateCandidate and Cell Update
A candidate value (C̃_t) is created using tanh. The new cell state is a combination of the old state (forgotten) and the candidate (input).
C̃_t = tanh(W_c·[h_{t-1}, x_t] + b_c)
C_t = f_t * C_{t-1} + i_t * C̃_t
h_t = o_t * tanh(C_t)Why LSTMs Work
The additive updates (C_t = f_t*C_{t-1} + ...) allow gradients to flow unchanged through many steps, avoiding vanishing. LSTMs are the default choice for most sequence tasks: text, speech, time series.
Two Minute Drill
- LSTM has a cell state (long‑term memory) and hidden state (short‑term).
- Forget, input, and output gates control information flow.
- LSTMs avoid vanishing gradients, learning long‑term dependencies.
- Used for text, speech, time series.
Need more clarification?
Drop us an email at career@quipoinfotech.com
