Hacker Newsnew | past | comments | ask | show | jobs | submit | vorticalbox's commentslogin

This reminds me of https://dnhkng.github.io/posts/rys/

David looks into the LLM finds the thinking layers and cut duplicates then and put them back to back.

This increases the LLM scores with basically no over head.

Very interesting read.


Jeff Dean says models hallucinate because their training data is "squishy."

But what's in the context window is sharp, the exact text or video frame right in front of them.

The goal is to bring more of the world into that context.

Compression gives it intuition. Context gives it precision.

Imagine if we could extract the model's reasoning core and plug it anywhere we want.


LLMs "hallucinate" because they are stochastic processes predicting the next word without any guarantees at being correct or truthful. It's literally an unavoidable fact unless we change the modelling approach. Which very few people are bothering to attempt right now.

Training data quality does matter but even with "perfect" data and a prompt in the training data it can still happen. LLMs don't actually know anything and they also don't know what they don't know.

https://arxiv.org/abs/2401.11817


> they also don't know what they don't know

they sort of do tho:

https://transformer-circuits.pub/2025/introspection/index.ht...


I won't quibble even though I likely should. Have to remember this is HN and companies need to shill their work otherwise ... Yes.

I will play along and assume this is sound. 10-40% +/- 10% is along the lines of "sort of" in a completely unreliable, unguaranteed and unproven way sure.


That’s not the only issue. They also have the problem that they’re built to always give an affirmative answer and to use authoritative wording, even when confidence is low. If they were trained to answer “I don’t know” instead of guessing, they’d hallucinate a lot less, but nobody seems to want that.

It calls to mind the issue of search engines that refuse to return “0 results found” anymore. Now they all try to give you related but ultimately incorrect results.

To me, that feels like gaslighting. It’s like if you ask someone to buy cheddar cheese at the store and they come back with mozzarella, and instead of admitting that the store was out of cheddar, they try to convince you that you actually really want mozzarella.


> If they were trained to answer “I don’t know”

If they were trained that an answe of "I don't know" was an acceptable answer, the model would be prone to always say "I don't know" because it's a universally acceptable answer.

It's a better answer even if it does "know".


That just sounds like a very fancy/marketing way of saying "models will hallucinate because you cannot compress all the facts in the world into the model size." (Without even getting into any other things that could cause plausible-but-incorrect output.)

>Imagine if we could extract the model's reasoning core and plug it anywhere we want.

Aren't a lot of the latest model variants doing something very similar? Stuff more domain-relevant knowledge into the model itself on top of a core generally-good reasoning piece, to reduce need to perfectly handle giant context?


I would assume if you are invited to join this round you will be send the questions. I would assume they would also fall under nda

Some ide’s already have this. In zed you can stick it “ask” mode.

Being able to use it as a rubber duck while it can also read the code works quite well.

There are a few APIs at work I have never worked on and the person that wrote them no longer works with us so AI fills that gap well.


extra high burns tokens i find. ( run 5.4 on medium for 90% of the tasks and high if i see medium struggling and its very focused and make minimum changes.

Yeah but it also then strikes the perfect balance between being meticulous and pragmatic. Also it pushes back much more often than other models in that mode.

Rework burns tokens.

Note mini-high is similar perf/latency to medium, but much cheaper

Not a problem if they're offering unlimited, lol

it allows you to track a browser forever because it is stable fingerprint point. This helps with long term tracking a great deal.

If I understand correctly, it was only stable until you restarted Firefox / your computer.

Ok that’s change it a bit but on the other hand I’ve had my browser open for weeks now and I only restart it when the “update” button turns red lol

correct. the ordering persists for as long as the original process continues to run

for agent agents we have ACP [0] surely their time would be better spent builing this sort of abstraction for computer use then simple teaching an AI to use a mouse?

The computer UI is the way it is because that is optimal for humans, if your plan is to replace humans why not just replace the whole stack os and all to something these models already know how to use?

[0] https://zed.dev/blog/acp-registry



Most APIs provide some sort of documentation. If it’s swagger you can just update the application from that.

It’s never free your shifting costs from paying a company for their api use vs the power costs of running it locally.

Sure, but it’ll be orders of magnitude cheaper in a few years. The consumer industry is already moving in this direction, with Apple leading the pack

You write the tests then it has a source of truth to know when it’s not working.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: