Anthropic Research going from strength to strength in interpretability. Publicly releasing the code so other labs can benefit from it is also a great move - very values aligned, and improves the overall AI safety ecosystem.
Never heard of this before. Is it a real thing? I mean, in the context of psychedelics experiencies. I've tried DMT a few times expecting the legendary all-healing trip everyone talks about but never worked.
> Of course it knows what it output a token ago...
It doesn't know anything. It has a bunch of weights that were updated by the previous stuff in the token stream. At least our brains, whatever they do, certainly don't function like that.
I don't know anything (or even much) about how our brains function, but the idea of a neuron sending an electrical output when the sum of the strengths of its inputs exceeds some value seems to be me like "a bunch of weights" getting repeatedly updated by stimulus.
To you it might be obvious our brains are different from a network of weights being reconfigured as new information comes in; to me it's not so clear how they differ. And I do not feel I know the meaning of the word "know" clearly enough to establish whether something that can emit fluent text about a topic is somehow excluded from "knowing" about it through its means of construction.
How close are you to saying that a repair manual "knows" how to fix your car? I think the conversation here is really around word choice and anthropomorphization.
The problem is, people think word choice influences capabilities: when people redefine "reasoning" or "consciousness" or so on as something only the sacred human soul can do, they're not actually changing what an LLM is capable of doing, and the machine will continue generating "I can't believe it's not Reasoning™" and providing novel insights into mathematics and so forth.
Similarly, the repair manual cannot reason about novel circumstances, or apply logic to fill in gaps. LLMs quite obviously can - even if you have to reword that sentence slightly.
You're making an argument Descartes formalized in the 1600s (and folks have been making long before him). It's a cute philosophical puzzle, but we assume that there's no Descartes' Demon fiddling with our thoughts and that we have a continuous and personal inner life that manifests itself, at least in part, through our conscious experience.
If anything, this confirms it for me. On his about page, there's this:
"Hi there, I am Loïc Baumann, I’m from Paris area, France
I develop, since early 90s, first assembly, then C++ and nowadays mostly .net.
My area of interest are 3D programming, low-latency/highly-scalable/performant solutions and many other things."
Compare that style to what's in this most recent blog - mildly ungrammatical constructions typical of an ESL writer, straightforward and plain style vs breathless, feed-optimized "not x, but y", triplet/rule of three constructions, perfect native speaker grammar but an oddly hollow tone. Or look at this post from 2018: https://nockawa.github.io/microservice-or-not-microservice/ It's just radically different (at a concrete syntactic level, no emdashes). I'm sure he has technical chops and it's cool that he worked on DOTS, but I would bet a very large amount of money he wrote the bullet points describing this project and then prompted GPT 5.3 to expand them to a blog post to "save time".
I agree that this triggered my AI writing senses. Points in favor:
- "It’s not an accident — it’s driven by the same physics." The classic "it's not x, it's y", with an em-dash thrown in for good measure
- "Typhon brings these into the component storage model — not as bolted-on workarounds, but as first-class citizens." More "not x, but y", this time with a leading clause joined by an emdash
- "Blittable, unmanaged, fixed-size, stored contiguously per type — that’s the ECS side." Short, punchy list of examples, emdash'd to a stinger, again typical of LLM writing
- "Schema in code, not SQL. Components are C# structs with attributes, not DDL statements. Natural for game developers, unfamiliar territory for database administrators. If your team thinks in SQL, this is a paradigm shift." This whole mini-paragraph is the x/y style, combined with the triplet / rule-of-three, just at the sentence scale. And then of course, the stinger at the end.
Definitive, no, but it certainly has a particular flavor that reads as LLM output to me.
This was definitely covered in my middle school classes (although those were 40 years ago). Standard US public school. We spent a fair amount of time discussing the Antipope, it always sounded like such a cool job name.
We also read Genesis in English classes (from a literary perspective).
If you think training a sparse autoencoder to extract concept vectors that are usable as steering injections into a modern LLM is pretty easy, you should probably go work for Anthropic's mech interp team ;)
It's a very complex joint (which is why it's never been done before that I could find --- hopefully will be patentable), and the tool definition probably wasn't optimal, nor the CAM tool being used appropriate to the task, hence my working on developing the toolpaths more directly.
"Precision, for LEGO, isn't an engineering choice, it's a brand promise." - The classic "It's not just x, it's y", just minus the "just".
"One philosophy optimizes for cost, the other for perfection." - Again we see the x/y structure; AI writing often features these forms, eg comparisons (x vs y), conversions (x into y), negated emphasis (not x, but y), etc.
"When you have multiple parts in an assembly, use statistical analysis for tolerance stack-up rather than worst-case math. Traceability matters. Track your defects so feedback turns precision into reliability." - More x/y followed by a short stinger ("Z matters"), and the closing sentence again follows the "x/y" pattern.
For funsies I tossed the whole thing into a purported AI detector and it said 90+% confidence of AI. I don't trust those types of things very much and suspect they have high false positive rates, but I have read that AI writing generally has measurably lower entropy, so maybe it's plausible, and in this case it aligns with my existing beliefs, so it obviously must be true.
reply