LSTM (Long Short Term Memory) Neural Network: The Solution for Dementia in AI Models
A primary trait of time series of natural, economic, social phenomena is that the past has an impact on the present and into the future. Thus x is a function of time viz: x=f(t-1, t-2, t-3 etc) where x is the phenomenon and t is time. In Statistics, this is known as autocorrelation. In other words, the phenomenon retains a memory of its past.
One of the traits in humans with Dementia is that their short term memory declines rapidly (while long term memory is retained). AI time series models are also afflicted with the machine equivalent of human Dementia. Time series which are very long such e.g. monthly value of Singapore's exports since 1965 have 'memory' and last month's export numbers will have an impact on this month's numbers, and the month before last month's exports will still have an impact on this month's exports with the impact gradually diminishing as time goes by. [thats partly the reason why we have business cycles]
And this memory and impact may go back 6 months or even a year depending on what kind of phenomenon. Previously time series Neural Networks such a Recurrent Neural Networks (RNN) had 'Dementia' as they suffer from sudden short term memory loss during the learning process. In math it's known as the cost function diminishing till gradient =0. At this point the short term memory loss is sudden and total. Just like you forgeting where you placed your cellphone, or house keys.
LSTM provides a solution for this problem by having three Gates in its architecture: Forget Gate, Input Gate and Output Gate. The Forget Gate decides what information to forget or retain, the Input Gate puts new memory into the network, and the Output Gate decides what to expose.
Explaining LSTM in non technical terms
Imagine you’re trying to read a very long story aloud to a friend.
You have a little notepad (your “memory”) that you can jot things on so you don’t forget important details— characters’ names, where the mystery key was hidden, etc. But you can’t write everything down, or the pad fills up. So you use three coloured sticky notes as helpers:
1 — Red sticky-note: “Forget?”
- Each time you flip a page you glance at the red note.
- It tells you which old scribbles you can safely erase (useless details) and which you should keep.
2 — Green sticky-note: “Write?”
- Before writing anything new, you peek at the green note.
- It says how much of the brand-new info from the current page deserves space on the pad.
3 — Yellow sticky-note: “Say out loud?”
- When you actually speak to your friend, you check the yellow note.
- It decides which parts of your notepad should be read aloud now and which should stay hidden for later.
How the notes change themselves
They’re not static instructions. Each sticky note is really a tiny calculator that looks at two things:
- What you just read on the current page (the input ).
- What you just finished saying (the previous output ).
Based on those, each calculator outputs a number between 0 and 1 (a “dimmer switch”).
- 0 ⇒ completely ignore.
- 1 ⇒ fully keep / write / say.
Everything in between is a partial blend.
Putting it all together every page (time-step)
- Red note decides how much of yesterday’s notes stay.
- Green note decides how much of today’s draft text gets written.
New pad contents = (kept old stuff) + (green × new draft). - Yellow note decides how much of the updated pad is spoken out.
Because the sticky notes are calculators with learnable knobs (their weights), the whole routine learns—by trial and error during training—exactly what to forget, what to store, and what to reveal so the story makes perfect sense.
That’s an LSTM: a memory pad (cell state) plus three smart dimmer switches (gates) that let a neural network remember useful bits of a long sequence while discarding clutter.
Comments
Post a Comment