Long Short-Term Memory (LSTM)

Long Short‑Term Memory (LSTM) is a type of RNN designed to overcome the vanishing gradient problem. It uses a gating mechanism to decide what to forget, what to store, and what to output. LSTMs can learn dependencies over hundreds of steps.

LSTM introduces a cell state that acts as a conveyor belt, with gates controlling information flow.

The Cell State (C_t)

The cell state runs through the entire chain with only minor linear interactions, allowing information to flow easily without vanishing. Gates add or remove information from the cell state.

Three Gates

Forget gate: decides what to discard from the previous cell state.
Input gate: decides what new information to store in the cell state.
Output gate: decides what to output based on the cell state.

Each gate uses a sigmoid activation (output 0‑1) to control information flow.

f_t = σ(W_f·[h_{t-1}, x_t] + b_f)   # forget gate
i_t = σ(W_i·[h_{t-1}, x_t] + b_i)   # input gate
o_t = σ(W_o·[h_{t-1}, x_t] + b_o)   # output gate

Candidate and Cell Update

A candidate value (C̃_t) is created using tanh. The new cell state is a combination of the old state (forgotten) and the candidate (input).

C̃_t = tanh(W_c·[h_{t-1}, x_t] + b_c)
C_t = f_t * C_{t-1} + i_t * C̃_t
h_t = o_t * tanh(C_t)

Why LSTMs Work

The additive updates (C_t = f_t*C_{t-1} + ...) allow gradients to flow unchanged through many steps, avoiding vanishing. LSTMs are the default choice for most sequence tasks: text, speech, time series.

Two Minute Drill

LSTM has a cell state (long‑term memory) and hidden state (short‑term).
Forget, input, and output gates control information flow.
LSTMs avoid vanishing gradients, learning long‑term dependencies.
Used for text, speech, time series.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Long Short-Term Memory (LSTM)

Need more clarification?