> Unlike Transformer models, Mamba models offer the advantage of linear time inf... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		modeless on July 17, 2024 \| parent \| context \| favorite \| on: Codestral Mamba > Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length > We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens Why only 256k tokens? Gemini's context window is 1 million or more and it's (probably) not even using Mamba.

rileyphone on July 17, 2024 [–]

Gemini is probably using ring attention. But scaling to that size requires more engineering effort in terms of interlink that goes beyond the purpose of this release from Mistral.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact