Hacker Newsnew | past | comments | ask | show | jobs | submit | scosman's commentslogin

They also do a good job working over the little differences between APIs. Tool calling sometimes breaks on major providers, and OR will patch it before the provider does. Libraries like LiteLLM do this too, but OR is faster.

Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchmarks use thousands of tasks.

GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.

Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.


okay what's a point in time where Claude was better? just give me a date

until GPT 5.4, Claude always had a decent edge in benchmarks (any date before then). The gap was huge during the Sonnet 3.x vs GPT 4.x eta.

I would have guessed that was just a bad title here but no, article states it as "opted in by default".

I fixed the title, sorry for the typo!

not your fault, the article uses that language!

even the color match...

beautiful. But looks like an Aston Martin

yes exactly. Too many people ask AI to one-shot complex tasks, and wonder it behaves like a junior asked to rush something.

I have my own skill: 5 rounds of research/planning/test-planning. Interactive with me in loop for all important decisions. Starts with high level shape, then details. Planning can take 2-3 days of my time, then the implementation agent can take many hours (Opus 4.7). It splits the implementation across many phases/commits, each with its own code-review fix loop. Deep code review at the end can take another hour or two. It opens a PR, Gemini reviews, it reads out and resolves those issues.

Projects still take days or weeks, but 5x faster than doing it all myself.

Edit: the skill - https://github.com/scosman/vibe-crafting


"yes exactly. Too many people ask AI to one-shot complex tasks, and wonder it behaves like a junior asked to rush something."

Because this version of AI is worth 10 trillion dollars.

While the pragmatic versions from realists you can find all over this thread are ultimately probably less of a speed boost than just having your CEO/local micromanager be conveniently on vacation during critical periods when the work actually gets done.


"Because this version of AI is worth 10 trillion dollars."

i wonder how much the real version of AI is worth. I've got a hinch we're going to find out pretty soon.


Probably still a lot. If not, then all of these people who praise them are just bad developers.

My personal experience with trying to front-load tons of planning and speccing out with LLMs is that at best it's a small improvement on code quality but with considerably more time spent.

As a result I've abandoned the idea of having LLMs generate code except for very small, localized and tightly scoped things. They really can't produce much more than a function or a small module without shitting the bed (last time I vibecoded was with Opus 4.6, Composer 2 and GPT-5.4). I use it almost entirely as another signal in analysis, which naturally makes it fit in better because all the other signals (reading the code, stepping through the code, writing the code myself) are already there so when the LLM points things out the information it actually renders can be taken in much more easily (and seen through more easily when it's false or irrelevant).

I think it's neat that people find fun ways to develop, but I think dressing up vibecoding in a fancy dress and layering SpecLang, sometimes in multiple steps, on top of it, is an exercise in trying to use the tool more instead of trying to use it in its most useful capacity.


I expect you'll be told to try Opus 4.7, and in short, JuSt WaiT FoR ThE NexT MoDel, BRo.

This has been my experience every time I've suggested that there are any sort of inherent ontological/conceptual or computational limits to the sophistication of LLM mimicry.


Even fully planned it’s still no better than a junior dev. You’re leaving out how much back and forth you have the ai do on itself, which you’d have on a junior dev too. In the end does it matter if it’s giving you what you want? Guess not really. But let’s not act like it’s crazy good when you’re still doing a lot of rounds of revisions on something an experienced dev would know to do right the first time.

I think that in general, people need to understand that they need to invest most of their time in the planning phase. High level plan, then spec are the baseline imo

Does the 5x faster including shipping? Or just the work part?

IMO if you are not shipping out faster then the faster work gains are meaningless.

If you are shipping faster, you’re probably picking up more work and shipping everything too fast leading to burnout.


If you're not shipping faster, it's meaningless, and if you are, it's also bad?

If you're not shipping faster it's meaningless for the company.

And if you are, it's bad for the employee.

Is what the above comment actually said.


yup.

The README has some renders. Some fun ones: 100/100 Acid test without a JS engine, convincing Google SERP even when blocked by bot-detection, makes my homepage look better than in real life.

Sure, but the difference here is the pirate is claiming it's "their data" and asking for donations.

Well, it is their data.

The word "their" is overloaded, it could mean "thing I have the legal right to", or, "thing I have in my possession right now".

The latter condition is clearly true. It's their data.

If you pretend the other definitions of possession don't exist and claim "aktually it's not theirs they don't have rights to it" then that's on you for faking an incomplete understanding of language.


Well, but if it’s the latter definition, then the AI didn’t train on their data, since the companies took possession of that data before doing a training run.

It’s only the former definition that would allow an AI model to have been trained on someone else’s data


> It’s only the former definition that would allow an AI model to have been trained on someone else’s data

There are yet more definitions of "theirs". For example, data whose provenance can be traced back to Anna's Archive.

So the data is legally owned by the book authors, possessed by Anna's Archive, and downloaded for training usage by the AI companies. Every person in that chain could, linguistically speaking, correctly refer to the data as "theirs", or refer to the data of a different entity as "theirs".


I suppose it depends if "their" implies possession or ownership. It would be correct to say they possess this data. It's dicier to say they own it, much like I "possess" the apartment I rent but I do not "own" it.

Regardless, digital file possession and ownership doesn't map cleanly to our language. I technically don't own any Kindle books I buy, I can't share them, yet I clearly have access to an ebook. So I both do and don't currently possess said book.


I’m renting my apartment but I still refer to it as my apartment. Everyone does this, it’s very common usage.

It's their servers sure, but if you download something under a license that doesn't grant you ownership, then it isn't yours.

You are being granted a license to use the data.


Yes, exactly, if you ignore all definitions of "yours" that involve possession then it isn't "yours".

But no one else is obligated to ignore the definitions of words that you're choosing to ignore, so the rest of us will go on saying it's their data.


Guess what, the AI companies training their models aren't going to include themselves in the "rest of us"

The AI companies training their models are going to refer to it as their own data, once it's on their servers.

If you steal my car, no who knows it's stolen would say it's "yours".

We're not talking abstract language concepts, this is a specific case. The data was taken without license/rights/approval. It's stolen. AA calling it "our data" is disingenuous. Legally it isn't theirs. While you could use "ours"/"theirs" loosely in English, they knew it wasn't true in a legal sense when publishing this.


Taking someone else's car illicitly is theft, because theft means taking with intent to deprive the rightful owner of it. Copying can never be theft, only moving can be theft, because only moving it could deprive the rightful owner of it. An illicit copy is merely copyright infringement or a breach of contract or various other concepts that are not theft despite people sometimes using that word as shorthand. It's YOUR illicit copy, not the rightful owner's illicit copy.

I didn't "steal" your passwords, I just "copied" them. I don't know what you're getting so upset about, you still have your list of passwords, and the fact that my changing all your accounts' passwords rendered that list worthless did nothing to move it.

If someone steals my passwords and then does nothing with them, or just uses them for their private purposes, then there's no problem. The problems only occur if my passwords are used to take control of my accounts or identity, which would deprive me of my accounts or money etc. So your example actually reinforces that the relevant ethical distinction (the harm) is indeed in intending to deprive someone of something they possess/control

I don't think this is the case legally, it might depend on the facts, but usually passwords are stored on your systems, and an attacker would have to not only access your system, but to exfiltrate that data.

It would constitute computer fraud and abuse by most definitions. This is relevant because it is sufficient to prove someone has your passwords in order to convict them, you don't need to prove they used them maliciously. (Provided of course they are a third party with no legitimate reason to have your passwords)


Stealing has a much looser definition than theft; notably, it can include ideas unlike theft. You deprived me of my accounts, but not of my now-obsolete passwords, therefore it's a theft of my accounts, but not theft of my now-obsolete passwords; I suppose you stole both. I'd be upset despite lack of password theft because I'd be the victim of your CFAA violation for example.

You copied his password? *******?

(I really hope that was an intentional reference or this won't make any sense.)


See, you typed hunter2 but all I see *******.

> theft means taking with intent to deprive the rightful owner of it.

That doesn't sound right my man. If I take your car and return it so you never knew it was taken, wouldn't it still be theft?

What if my intent isn't to sabotage you but to enjoy the car for myself and your deprivation is merely collateral damage, not my intent, is that not theft?


> The data was taken without license/rights/approval. It's stolen.

That's incorrect. A license violation isn't theft. Theft deprives others of their property, that's not what's going on here. Intellectual property is a fictional "ownership" that provides value to society, but it is much newer and different than the actual ownership of property.

No one actually owns a collection of words or ideas or thoughts.


The tricky bit is that while it's impossible to deprive someone of their idea (i.e., commit theft of an idea), it's possible to steal someone's idea (i.e., copy it and use it illicitly), because only the word theft, but not the word steal, has that "deprive others" stipulation.

So with that in mind, circling back to whether possession occurs in such a way to make possessive language appropriate (being able to say "my data" after stealing data but not depriving the author of the data), my opinion is that the copy of the data that the author still controls is the author's data, and the copy of the data that the stealer controls is the stealer's data. It's the author's idea, but both parties separately possess the data (the data is a record of the idea).


Yet the main holders of this position were caught saying "our data". Don't you see the irony?

> If you steal my car, no who knows it's stolen would say it's "yours".

The chop shop well might.

Or, if I steal your car, and then go on to use it daily for the next 10 years, at some point everyone I know will refer to it as "my" car even if they're all entirely aware it was stolen.

> they knew it wasn't true in a legal sense when publishing this

I'm not sure why you're expecting the operators of a pirate site to use legally rigorous terms to refer to themselves in a blog post. This is an error in your expectations, not their terminology.


I am totally with you in this line of argument.

However, I think there might be a legal framework in which the stolen good might locally be yours within the context of a single transaction or dispute.

E.g: if I buy a car from you for X$, and you deliver it, that's your car for the purposes of that deal. If it later turns out it was stolen, that changes the facts and is an additive change not a transformative change to the original transaction. If we imagine a chain of transactions, you might analyze the dispute as it spreads through the contract chain through a centrallized birds-eye doctrine, or you might analyze the matter contract by contract, in a sort of distributed algorithm.

In that latter sense, it makes sense to refer to the asset as property of one party independently of whether it will truly be deemed as theirs. Under this frame, for the purpose of that contract, there is an implicit claim of property, and an implicit risk of the asset being stolen. If the car is later found to be stolen, it isn't parsed as the car being property of someone else, much less always having been of someone else, but rather there would be a new fact: of there being a competing claim of ownership over the asset, which might or might not have legal and ethical grounds, and might or might not be successfully defended in a court, resulting in an obligation to return the asset, devaluing the ownership claim to 0.

Mathematically the asset being traded is no longer the subject of the contract, rather it'd be about a legal and ethical claim to the asset itself, which has a subjective value p between 0 and 1, which when multiplied by the value of the asset yields the Expected Value EV. There is a market for ownership claims where p<1 all the way to p>0. Theft and criminal charges need not always be the reason there is uncertainty over ownership, succession disputes, bankruptcies, ongoing litigation over the asset, patents, ip claims, wars, etc...


It means whatever is convenient. If you are looking to monetize knowledge you would use it like "your car", half way your books are just books you've purchased a copy of, at the other end your car is now mine.

I found an abandoned bicycle 10 years ago. I have since replaced nearly all parts of it. I would give it back if you can prove it is yours but who owns the bicycle of theseus is more of an opinion.

I refer to it as my bicycle.


"but if you download something under a license that doesn't grant you ownership, then it isn't yours."

Possession is 9/10 of the law - if you have a copy, you have possession, and thus you have SOMETHING and LEGALLY it is considered yours (now whether you legally obtained it is a different story and THAT is where charges stem from.)


Random nit, the original saying was "possession is 9 points of the law", attributes that strengthened legal claims, rather than a percentage. Things like possession, good lawyer, money, patience, witnesses, for which if you had the object in your possession were likely to be in your favor.

And especially in the case of LLMs, they are trained from text that (probably) comes directly from AA.

Their data about not their work

If e.g. OpenAI downloaded it from their servers to train their models, what else is there to say about it?

If capitalism was capable of actually preserving the knowledge of humanity, we wouldn't need things like Anna's Archive.


We won’t be doing it in 2 years. By then my side will have won!

Doesn't work on Chrome or safari for me

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: