I think this statement is on the same level as "a human cannot explain why they ...

bandrami · 2026-02-17T08:09:29 1771315769

But it's not a reasoning trace. Models could produce one if they were designed to (an actual stack of the calls and the states of the tensors with each call, probably with a helpful lookup table for the tokens) but they specifically haven't been made to do that.

rocqua · 2026-02-17T08:13:37 1771316017

When you put an LLM in reasoning mode, it will approximately have a conversation with itself. This mimics an inner monologue.

That conversation is held in text, not in any internal representation. That text is called the reasoning trace. You can then analyse that trace.

bandrami · 2026-02-17T08:15:49 1771316149

Unless things have changed drastically in the last 4 months (the last time I looked at it) those traces are not stored but reconstructed when asked. Which is still the same problem.

ehsanu1 · 2026-02-17T08:51:45 1771318305

They aren't necessarily "stored" but they are part of the response content. They are referred to as reasoning or thinking blocks. The big 3 model makers all have this in their APIs, typically in an encrypted form.

Reconstruction of reasoning from scratch can happen in some legacy APIs like the OpenAI chat completions API, which doesn't support passing reasoning blocks around. They specifically recommend folks to use their newer esponses API to improve both accuracy and latency (reusing existing reasoning).

tibbar · 2026-02-17T08:43:27 1771317807

For a typical coding agent, there are intermediate tool call outputs and LLM commentary produced while it works on a task and passed to the LLM as context for follow up requests. (Hence the term agent: it is an LLM call in a loop.) You can easily see this with e.g. Claude Code, as it keeps track of how much space is left in the context and requires "context compaction" after the context gradually fills up over the course of a session.

In this regard, the reasoning trace of an agent is trivially accessible to clients, unlike the reasoning trace of an individual LLM API call; it's a higher level of abstraction. Indeed, I implemented an agent just the other day which took advantage of this. The OP that you originally replied to was discussing an agentic coding process, not an individual LLM API call.

bandrami · 2026-02-17T10:05:06 1771322706

Well, right, I see those reasoning stages in reasoning models with Ollama and if you ask it what its reasoning was after the fact what it says is different than what it said at the time.

tibbar · 2026-02-17T20:10:14 1771359014

I can't speak to your specific set up, but it sounds like you're halfway there if you can access the previous traces? All anyone can ask for is "show me the traces that led up to this point"; the "why did you do this" is a notational convenience for querying that data. If your set up isn't summarizing those traces correctly, then that sounds like a specific bug in the context or model quality, but the point is that the traces exist and are queryable in the first place, however you choose to do that.

(I am still primarily talking about agent traces, like the original OP, not internal reasoning blocks for a particular LLM call, though - which may or may not be available in context afterwards.)

In particular, asking "why" isn't a category error here, although there's only a meaningful answer if the model has access to the previous traces in its context, which is sometimes true and sometimes not.