So does openai know how to widen the context window without it taking more money? Otherwise Google wins, again. And this is all boring. Gemini 2.5 pro preview where you can just insert all files you have and actually it doesn't compress and has it in memory is just what you want. All the compression tricks etc really are shit compared. 32k input tokens is a joke now once you tried this.
As in bearish on openai if they don't offer cheaper 10m context soonish. Google will.
If raw AI power is the key, Google seems to be in pole position form here on out. They can make their own TPUs, have their own data center. No need to "Stargate" with Oracle and Softbank in tow. Google also has Android, YouTube and G-Suite.
However, OpenAI has been going down the product route for a few years now. After a spout of high-profile research exits it is clear Altman has purged the ranks and can now focus on product development.
So if product is a sufficient USP, and if Altman can deliver a better product, they still have a chance. I guess that is where Ive comes into picture. And Google is notoriously bad at product that is internally developed.
A lot of ifs there. When judging how likely Altman would be to deliver a better product, what other product has he delivered besides an orb that scans your eyeballs in exchange for crypto?
Full attention to 1M context is nonsense. Yes, Gemini can do needle-in-haystack, but do you actually need to feed 1M tokens to find one thing? People who have a lot of experience with using LLM for code generation claim that performance degrades past certain point, even if all context is somewhat-relevant.
What we need is not "long context", we need memory: ability for LLM to address datasets of arbitrary size.
RAG has bad reputation but there's a myriad of different ways for doing RAG. Say, "agentic" tool calls which fetch specific data is essentially a form of RAG. But it's cool because it's not called RAG, right?
Anyway, this definitely requires some innovation, but I doubt "longer context" is exactly what we need.
Our company has development documents, guidelines, api's going back almost 20 years. If you follow them, life is good, if you don't, things don't work. The 20 years is relevant because this is a lot of text + code. When we give this to o3, o4-mini, claude 3.5/7 it just ignores rules randomly; when we give it to gemini 2.5 pro preview, it just works. And after prompting multiple times in chat, the other models just start going into complete nonsense land. We often have cases where it even starts generating code in python while we were working in TS; apparently it compressed it's context so much it forget the actual basics? Not gemini. Haven't been able to mess it up in any practical case yet, which is why, maybe erroneously, attributed that to the context.
"Context compression" is something a tool like Cursor does, not the model itself. It seems like the tool you're using works better with Gemini.
From my experience, pretty much all coding tools have their quirks.
I generally agree that Gemini is a very strong model, but I don't think we can at this point we can conclusively say Google would win because of the long context.
It's too much to extrapolate from a single case. E.g. I see Gemini struggling with editing files a bit more than other models, but I'd say it's just growing pains rather than something fundamental
Losing what, exactly? I do notice they seem to lose the hype battle—and my perception is that OpenAI acquiring Jony Ive’s startup gets more traction than Google Nobels—but I think with their foundation they can play on a different time horizon, so I am not sure how much they should care about that.
4.1 in api already provides 1 million tokens. Anthropic’s enterprise version does too.
I’m not sure if this is a software or a hardware (computer) problem.
As in bearish on openai if they don't offer cheaper 10m context soonish. Google will.