More

teo_zero · 2026-04-21T23:13:37 1776813217

Well said! Those Europeans with their tariffs, and their stupid motto "make Europe great again"!

Oh wait...

teo_zero · 2026-04-21T22:52:01 1776811921

Neither is ricotta, actually.

teo_zero · 2026-04-19T07:33:54 1776584034

When you have so few bits, does it really make sense to invent a meaning for the bit positions? Just use an index into a "palette" of pre-determined numbers.

As a bonus, any operation can be replaced with a lookup into a nxn table.

petters · 2026-04-19T21:01:20 1776632480

That's a good idea and it exists: https://www.johndcook.com/blog/2026/04/18/qlora/

It seems quite wastful to have two zeros when you only have 4 bits it total

saulpw · 2026-04-19T22:08:20 1776636500

OTOH, it seems quite plausible that the most important numbers to represent are:

   +0
   -0
   +1
   -1
   +inf
   -inf

parsimo2010 · 2026-04-19T23:23:51 1776641031

In standard FP32, the infs are represented as a sign bit, all exponent bits=1, and all mantissa bits=0. The NaNs are represented as a sign bit, all exponent bits=1, and the mantissa is non-zero. If you used that interpretation with FP4, you'd get the table below, which restricts the representable range to +/- 3, and it feels less useful to me. If you're using FP4 you probably are space optimized and don't want to waste a quarter of your possible combinations on things that aren't actually numbers, and you'd likely focus your efforts on writing code that didn't need to represent inf and NaN.

  Bits s exp m  Value
  -------------------
  0000 0  00 0     +0
  0001 0  00 1   +0.5
  0010 0  01 0     +1
  0011 0  01 1   +1.5
  0100 0  10 0     +2
  0101 0  10 1     +3
  0110 0  11 0     +inf
  0111 0  11 1     NaN
  1000 1  00 0     -0
  1001 1  00 1   -0.5
  1010 1  01 0     -1
  1011 1  01 1   -1.5
  1100 1  10 0     -2
  1101 1  10 1     -3
  1110 1  11 0     -inf
  1111 1  11 1     NaN

saulpw · 2026-04-21T03:14:57 1776741297

I can see the most important values being:

   ± 0 (infinitesimal)
   ± 10^-2n
   ± 10^-n
   ± 1 (unity)
   ± 10^n
   ± 10^2n
   ± infinity

For fp4, this leaves 2 values. Maybe one of them should be NaN. What should the other one be?

Dwedit · 2026-04-19T23:00:02 1776639602

Why waste a slot on -0?

adampunk · 2026-04-20T13:00:58 1776690058

You need it if you want the idea of total ordering over the extended Reals. There's +/- infinity--an affine closure, not projective (point at infinity)--so to make that math work you need to give 0 a sign.

saulpw · 2026-04-19T23:49:53 1776642593

Because it means "infinitesimal negative" which is distinct from "infinitesimal positive".

Dylan16807 · 2026-04-20T02:08:07 1776650887

That sounds pretty niche. What's a use case where you have less than 8 bits and that distinction is more important than having an extra finite value? I don't think AI is one.

jlokier · 2026-04-20T03:55:38 1776657338

For neural net gradient descent, automatic differentiation etc, the widely used ReLU function has infornation carrying derivatives at +0 and –0 if those are infinitesimals.

Dylan16807 · 2026-04-20T06:19:59 1776665999

Barely any information. After surviving RELU that signed zero is probably getting added to another value and then oops the information is gone. It sounds a lot worse than properly spaced values.

saulpw · 2026-04-20T06:33:26 1776666806

sign = most important bit of information

Dylan16807 · 2026-04-20T10:32:05 1776681125

If you were looking at the entire number line, sign would roughly be the most important part.

But you still have all the other numbers carrying sign info. This is only the sign of denormals and that's way less valuable. Outside of particular equations it ends up added to something else and disappearing entirely. It would be way better to cut it and have either half the smallest existing positive value or double the largest existing value as a replacement. Or many other options.

0-_-0 · 2026-04-19T10:50:27 1776595827

You want to make multiplication cheap, it's not just about compression

mysterydip · 2026-04-19T12:36:05 1776602165

Wouldn’t multiplication just be an 8 bit lookup table? a*b is just lut[a<<4+b]

0-_-0 · 2026-04-20T17:05:47 1776704747

A 256 element lookup table is much bigger than a simple multiplier

kevmo314 · 2026-04-19T20:23:01 1776630181

Multiplication at this resolution is already implemented via lookup tables.

ineedasername · 2026-04-19T22:29:33 1776637773

For FP4, yes... sometimes... it depends. But newer Nvidia architecture eg Blackwell w/ NVFP4 does not, they perform micro block scaling in the core. For older architectures, low quants like FP4 are also often not done native, and instead inflated back to BF16, eg with BnB.

londons_explore · 2026-04-22T05:50:44 1776837044

Specifically, you want to choose 16 values, all of which you can multiply an activation value by using circuitry which is as small as possible.

childintime · 2026-04-19T07:45:12 1776584712

Exactly. And pick them on the e^x curve.

adrian_b · 2026-04-20T13:41:32 1776692492

As explained in an article linked at the bottom of TFA, the weights of a LLM have a normal (Gaussian) distribution.

Because of that, the best compromise when the weights are quantized to few levels is to place the points encoded by the numeric format used for the weights using a Gaussian function, instead of placing them uniformly on a logarithmic scale, like the usual floating-point formats attempt.

teo_zero · 2026-04-18T07:39:35 1776497975

Could anyone understand what this sentence means?

> Upon freeing an unreachable AllocationRecord, call filc_free on it.

I think the intention was to say: before freeing an unreachable AR, free the memory pointed to by its visible_bytes and invisible_bytes fields.

teo_zero · 2026-04-17T05:28:09 1776403689

A form of self-fulfilling prophecy?

teo_zero · 2026-04-16T05:58:41 1776319121

In fact it's not. The name itself mimicks cat, not less. It's a filter that adds annotations to its input, such as syntax highlighting, git diffs and special-char coloring.

Personally I can't find any use for bat: I'm a devote user of vim for editing, and it already does all of this, so why not using it to view files as well? It's satisfying to have the same interface, colors and shortcuts whether you're editing or viewing!

technojamin · 2026-04-16T15:04:24 1776351864

I use it for previewing files in `fzf` and `lf` (terminal file manager).

teo_zero · 2026-04-15T15:57:06 1776268626

The current AI approach to technology is masterfully described as

> to build something enormous, declare it transformative, and hope nobody asks what it actually computes.

And the corollary:

> [such] approach requires billions of dollars and produces systems that cannot explain themselves.

teo_zero · 2026-04-15T06:03:17 1776232997

I feel that saying that EML can't generate all the elementary functions because it can't express the solution of the quintic is like saying that NAND gates can't be the basis of modern computing because they can't be used to solve Turing's halting problem.

reikonomusha · 2026-04-15T06:15:07 1776233707

As is usual with these kinds of "structure theorems" (as they're often called), we need to precisely define what set of things we seek to express.

A function which solves a quintic is reasonably ordinary. We can readily compute it to arbitrary precision using any number of methods, just as we can do with square roots or cosines. Not just the quintic, but any polynomial with rational coefficients can be solved. But the solutions can't be expressed with a finite number of draws from a small repertoire of functions like {+, -, *, /}.

So the question is, does admitting a new function into our "repertoire" allow us to express new things? That's what a structure theorem might tell us.

The blog post is exploring this question: Does a repertoire of just the EML function, which has been shown by the original author to be able to express a great variety of functions (like + or cosine or ...) also allow us to express polynomial roots?

zeroonetwothree · 2026-04-15T13:30:34 1776259834

That’s a poor analogy because all polynomials can be solved to arbitrary precision with efficient algorithms.

teo_zero · 2026-04-14T15:49:50 1776181790

> If you don't manage the history properly in your SPA, pressing the back button could take the user out of the app entirely.

Yes. And that should be the default behavior: browser buttons should take you through the browser's history. If you keep a in-app state and want the user to navigate through it, you should provide in-app buttons.

Nobody complains that the browser's close button quits the browser instead of the app it's showing, or that the computer's power button shuts down the whole OS and not only the program in the foreground.

Users must be educated. If they have learned that left means "back" and right means "forward", that a star (sometimes a heart) means "remember this for me", and that an underlined checkmark means "download", then understanding the concept of encapsulation shouldn't be too much for them.

least · 2026-04-14T16:46:03 1776185163

> Yes. And that should be the default behavior: browser buttons should take you through the browser's history. If you keep a in-app state and want the user to navigate through it, you should provide in-app buttons.

The Back and Forward buttons on a web browser is the navigation for the web. If you click a link on a static html page it will create a new entry. If you click back, it'll take you back. If you press forward, You will navigate forward.

We should not be creating a secondary set of controls that does the same thing. This is bad UX, bad design, and bad for an accessible web.

> Nobody complains that the browser's close button quits the browser instead of the app it's showing, or that the computer's power button shuts down the whole OS and not only the program in the foreground.

It does close the app it's showing because we have tabs. If you close a tab, it'll close the app that it's showing. If you close the browser, which is made up of many tabs, it closes all of the tabs. Before tabs, if you closed a window, the web page you were on would close as well. It does what is reasonably expected.

If on your web application you have a 'link' to another 'page' where it shows a change in the view, then you'd expect you would be able to press back to go back to what you were just looking at. SPAs that DON'T do that are the ones that are doing a disservice to the user and reasonable navigation expectations.

> Users must be educated. If they have learned that left means "back" and right means "forward", that a star (sometimes a heart) means "remember this for me", and that an underlined checkmark means "download", then understanding the concept of encapsulation shouldn't be too much for them.

They should not have to be 'educated' here. The mental model of using the back and forward buttons to navigate within a webpage is totally fine.

teo_zero · 2026-04-12T07:47:47 1775980067

Sorry, I've read and reread TFA but the concept still evades me. Is it that, since it's easier for a hash function to have higher entropy in the higher bits than in the lower ones, it would be more logical for hash tables to discard the lower bits and keep the higher ones?

adrian_b · 2026-04-12T09:48:19 1775987299

Higher entropy in the higher bits is a property of addition and multiplication when the result has the same size as the operands (i.e. the result is taken modulo 2^N, which folds back the values exceeding the size of the result).

When other operations are used, there may be other bits with higher entropy. For example, when full-length multiplication is used (i.e. where the length of the result is the sum of the lengths of the operands), the middle bits have the highest entropy, not the top bits. On CPUs like the Intel/AMD x86-64 CPUs, where fast long multiplication instructions are available, this can be exploited in more performant hash functions and PRNGs.

In hash functions, additions and multiplications are frequently used together with rotations, in order to redistribute the entropy from the top bits to the bottom bits, during the following operations.

purplesyringa · 2026-04-12T10:11:48 1775988708

Honorary mention: byte swapping instructions (originally added to CPUs for endianness conversion) can also be used to redistribute entropy, but they're slightly slower than rotations on Intel, which is why I think they aren't utilized much.

purplesyringa · 2026-04-12T09:01:32 1775984492

Yes, that's my point. It's not true that all hash functions have this characteristic, but most fast ones do. (And if you're using a slow-and-high-quality hash function, the distinction doesn't matter, so might as well use top bits.)