So does openai know how to widen the context window without it taking more money...

zoogeny · 2025-05-21T21:25:22 1747862722

I agree we are watching the turning point.

If raw AI power is the key, Google seems to be in pole position form here on out. They can make their own TPUs, have their own data center. No need to "Stargate" with Oracle and Softbank in tow. Google also has Android, YouTube and G-Suite.

However, OpenAI has been going down the product route for a few years now. After a spout of high-profile research exits it is clear Altman has purged the ranks and can now focus on product development.

So if product is a sufficient USP, and if Altman can deliver a better product, they still have a chance. I guess that is where Ive comes into picture. And Google is notoriously bad at product that is internally developed.

samtp · 2025-05-21T23:21:57 1747869717

A lot of ifs there. When judging how likely Altman would be to deliver a better product, what other product has he delivered besides an orb that scans your eyeballs in exchange for crypto?

disgruntledphd2 · 2025-05-22T10:49:16 1747910956

I mean, I guess the OPs argument is not that Altman is good at product, but that Google are pretty bad.

And historically, that's definitely been true. I do think they're doing well on the AI front at the moment, but who knows if that will continue.

killerstorm · 2025-05-22T08:54:43 1747904083

Full attention to 1M context is nonsense. Yes, Gemini can do needle-in-haystack, but do you actually need to feed 1M tokens to find one thing? People who have a lot of experience with using LLM for code generation claim that performance degrades past certain point, even if all context is somewhat-relevant.

What we need is not "long context", we need memory: ability for LLM to address datasets of arbitrary size.

RAG has bad reputation but there's a myriad of different ways for doing RAG. Say, "agentic" tool calls which fetch specific data is essentially a form of RAG. But it's cool because it's not called RAG, right?

Anyway, this definitely requires some innovation, but I doubt "longer context" is exactly what we need.

anonzzzies · 2025-05-22T11:58:10 1747915090

Our company has development documents, guidelines, api's going back almost 20 years. If you follow them, life is good, if you don't, things don't work. The 20 years is relevant because this is a lot of text + code. When we give this to o3, o4-mini, claude 3.5/7 it just ignores rules randomly; when we give it to gemini 2.5 pro preview, it just works. And after prompting multiple times in chat, the other models just start going into complete nonsense land. We often have cases where it even starts generating code in python while we were working in TS; apparently it compressed it's context so much it forget the actual basics? Not gemini. Haven't been able to mess it up in any practical case yet, which is why, maybe erroneously, attributed that to the context.

killerstorm · 2025-05-23T19:22:29 1748028149

"Context compression" is something a tool like Cursor does, not the model itself. It seems like the tool you're using works better with Gemini.

From my experience, pretty much all coding tools have their quirks.

I generally agree that Gemini is a very strong model, but I don't think we can at this point we can conclusively say Google would win because of the long context.

It's too much to extrapolate from a single case. E.g. I see Gemini struggling with editing files a bit more than other models, but I'd say it's just growing pains rather than something fundamental

mmaunder · 2025-05-22T01:06:03 1747875963

Yeah Google has it all vertically integrated from the science to the chips and everything in between. It’s theirs to lose.

yellow_postit · 2025-05-22T01:42:44 1747878164

They’re doing a great job losing it so far.

ruraljuror · 2025-05-22T02:41:00 1747881660

Losing what, exactly? I do notice they seem to lose the hype battle—and my perception is that OpenAI acquiring Jony Ive’s startup gets more traction than Google Nobels—but I think with their foundation they can play on a different time horizon, so I am not sure how much they should care about that.

KolibriFly · 2025-05-22T07:15:50 1747898150

The real question is who gets to "good enough" memory for cheap first - and whether they can do it without hallucinating or degrading performance

sagarpatil · 2025-05-22T02:50:57 1747882257

4.1 in api already provides 1 million tokens. Anthropic’s enterprise version does too. I’m not sure if this is a software or a hardware (computer) problem.

asadm · 2025-05-21T20:37:08 1747859828

bingo. chatgpt does some summarization/memory thing recently. It's meh tbh.