I spent the last weekend thinking about continual learning. A lot of people think that we can solve long term memory and learning in LLMs by simply extending the context length to infinity. I analyse a different perspective that challenges this assumption.
Also interesting to think about: could a single system be generally intelligent, or is a certain bias actually a power. Can we have billions of models, each with their own "experience"
I think both the views have their merits. In my mind the hardware vs software analogy for weights vs context holds better because in most modern computing systems, the hardware is fixed and the software changes. What the system can do efficiently, in practice, is a function of both the limitations/capabilities of the hardware and the software their respective capability ceilings.
The brain theory also kind of says the same thing, but it's hard to say what stays fixed vs changes with experience in the brain ig.
Another way I see it is... Mind is process. LLM is (very lossy) snapshotted state of process/mind. LLM in-process is mind-emulator with potential to explore the state-space of the mind-snapshot. Consequently, and by its very construction, LLM cannot be mind.
I spent the last weekend thinking about continual learning. A lot of people think that we can solve long term memory and learning in LLMs by simply extending the context length to infinity. I analyse a different perspective that challenges this assumption.
Let me know how you think about this.