I am still 100% willing to concede that Opus 4.8 or even how good Mythos is supposed to be is not yet at the general reasoning ability of top 5% coders or any smart human with a few years of domain experience. However the rate of improvement is so consistent and unrelenting that it seems silly to assume that it will just stop short of human level. Even if algorithmic, research and data quality improvements suddenly stopped, we still have years of better GPU’s and scaling
Your statement seems to be implying (correctly) that LLMs can program, but just not as well as humans. If they're able to program presumably without "thinking" as you seem to be (implicitly) narrowly defining it, then why do you think that limits them to always being sub-par?
It seems like if they can do it, that there's no reason they can't eventually be trained to do it better up to and beyond human performance. It seems strange to suggest that thinking unlocks some nominal margin of "better" specifically that can't be overcome.
All of that aside, even if they can't outperform the top human programmers...what if they get to within a margin where they're still better than most? Isn't a 95th percentile programmer that can run 24/7 and continuously refine its work still going to ultimately come out on top?
I'm more interested in the conclusion that programming doesn't require thinking. And that's where the argument breaks. It seems so obvious, but sometimes the most obvious things are the least true.
>I'm more interested in the conclusion that programming doesn't require thinking.
I suspect it largely has to do with how one defines "thinking". It seems like people like to implicitly define it in such a way as to require a human (or animal), but there are many examples of thinking/intelligence in nature that don't require a brain or even neurons.
I'm genuinely curious: without using the word "think" with all of its ambiguity, can you articulate what it is that we're doing that these models are not capable of? Because it's pretty clear (to me, at least) from the research, particularly a lot of the mechanistic interpretability work coming out of Anthropic, that the models are at least doing something akin to what we think of as thinking, even if it appears foreign to us.
What LLM’s can do now, 99.999% of people 3 years ago would say would only be possible with “thinking”. To claim that LLM’s aren’t thinking is sensible if your definition of thinking is inextricably linked to the chemical processes that occur in human brains, but then it ceases to be a useful definition with respect to evaluating a system’s ability to process information, form connections, and reason via abstraction. It is objectively true that LLM’s can do the latter things.
Yeah, I have to admit to finding it somewhat ironic that some individuals accuse the "pro AI" folks of magical thinking, when it seems that escalating levels of magical thinking are being used by the "anti" crowd to suggest that the models can never achieve something akin to human intelligence (particularly in light of the fact that they have on certain dimensions done exactly that).
It's pretty clear that there are significant differences between their intelligence and human intelligence. But that doesn't mean there isn't some sort of intelligence here.
The issue about AI is that it's gobbling so much information that at some point you couldn't tell the difference. Programming specifically is something that inherently documents itself, meaning while human communication and context and memes and culture is something that evolves and exists many times outside of textual mediums, as soon as any new piece of code is born it is now part of the AI's dataset. And it doesn't help that a vast majority of our code is pretty damn repetitive, especially if you insert code written in the span of two decades and more into the future.
Tldr : The better we get at coding, the more code we write, the better AI gets at coding.