More

kimixa · 2026-05-04T03:21:05 1777864865

I think the sheer number of people below arguing it might not be about nationalism shows this sort of "Obvious" direct work may still be needed.

SideburnsOfDoom · 2026-05-04T10:03:37 1777889017

> I think the sheer number of people below arguing

That says more about "the people below" on HN to me. There's a strong strand of contrarian, pseudo-intellectual sophistry. I.e. it's "clever" to talk yourself out of seeing the obvious.

kimixa · 2026-05-01T03:49:46 1777607386

If this stops the core being able to drop to a lower power state it can be whole multiples of power use on some devices.

Wake ups are death for mobile form factors, even if not really doing much work.

andai · 2026-05-01T17:13:15 1777655595

This is a pretty good argument against the way we do operating systems now, right?

kimixa · 2026-05-01T21:15:04 1777670104

Why? Most modern OSs are "tickless" - where there's no regular scheduling tick and it can sleep pretty much indefinitely if there's no work.

kimixa · 2026-04-27T20:03:10 1777320190

I'd argue the issue is people have figued out that "shit stirring" can make actual meaningful differences to reality, be they foreign or local.

When the limit of effect a flamewar would have is if Star Trek or Star Wars got the top billing, or Vim was recommended to new programmers instead of Emacs, it was a fun novelty.

But now there's real money and power resulting from this shit stirring of course people will use it as a means to an ends. They've optimised professional shit-stirring because it's so valuable now.

kimixa · 2026-04-27T19:55:24 1777319724

This is also something I try to ask for - generally I get the "that's fine" from the hiring manager and HR, but both times I've then had to push back and get it added to the actual supplied contract. And that was very much not easy.

And even then there's normally a "Sufficiently Different Sector" requirement for those personal projects - which makes sense, but it is inevitably worded vague enough that it would likely require going to court for pretty much any project to show it's not directly related. And that would be near prohibitively expensive for me as an individual if the relationship actually became adversarial.

kimixa · 2026-04-27T03:19:23 1777259963

Ha, I worked for a company that until ~2012 still used RCS-backed SCM, absolute hack job on a shared file share that wrapped RCS with a "project file" to allow a tree of specific revisions for a "project". "MKS" it was called. And by the sound of it the "old" '90s version, not the java EE rewrite.

That meant the files has the entire "$Revision: 1.3 $" nonsense and "file changelog" at the top too - though many newer files never bothered to include the tags to actually get RCS to replace them. Inconsistent as hell.

And while the "family" of devices the software was for traces it's origin to the mid '90s, functionally none of the code was older than ~5 years at that time.

Naturally even with only a few tens of engineers it regularly messed up, commits stepped on each other's toes and the entire tree got corrupted regularly. For fun I wrote a script that read it all and imported the entire history into git - you only had to go back a few years before the entire thing was absolute nonsense.

I have no idea why that was still being used then, but I assume it had been in use from the very start of that entire hardware family. Perhaps as it was fundamentally a "hardware" company - which until surprisingly recently seemed to consider "source control" to be "shared folders on remote machines" - "software" source control wasn't considered a priority.

anthk · 2026-04-27T13:28:17 1777296497

RCS->CVS and from that you can convert it to GIT or SVN.

kimixa · 2026-04-27T16:15:58 1777306558

The issue was the rcs files were simply corrupt - no matter what tool you used the older deltas were just bad. Just people didn't notice/care as they were "old" revisions.

And I couldn't find any tool that supported the mks "project" files that linked multiple rcs revisions into a single "commit", so something a little custom was needed anyway. At least for the ancient mks version used.

Quite a bit of effort was put into it during the "official" migration, but they eventually gave up too as even the oldest backup archives they could find had the same issues.

kimixa · 2026-04-26T21:36:35 1777239395

I suspect there are at least as many programmers working as the ASM level today than there ever was - they're a lower proportion, but the total number of programmers has increased dramatically.

I wonder if this sort of trend will continue?

kimixa · 2026-04-25T23:42:17 1777160537

PCIe also had things like "1.1", "2.1" and "3.1" - that fixed issues and added functionality - but there wasn't the same crossover between "feature sets and spec revisions" and "speeds" we see in USB today.

mistyvales · 2026-04-26T00:04:25 1777161865

Manufacturers of mainstream consumer motherboards never used 1.1, 2.1, etc. for PCI-E though. What is 4.0 on the spec sheet will be 4.0 to the buyer. My old 2016 motherboard has a slew of 3.0 labelled USB ports that are now not 3.0, hence the conundrum. It just doesn't make sense why they changed established naming conventions. Is this something that causes me sleepless nights? Not in the least. But it's still an annoyance for consumers and even advanced users as detailed in that latest Geerling video et al.

kimixa · 2026-04-26T06:53:20 1777186400

1.1 was very much commonly used in consumer marketing, to the level where there's many instances today of people referring to pcie1.x speeds as "1.1". And I'm pretty sure I've seen 2.1 in consumer marketing contexts. But you're right I didn't know 3.1 existed until I looked it up :p

But USB 3.0 is pretty much the only "speed" that hasn't changed - it always required the extra connectors for 5Gbps from the start - but no more. What about those ports is now not "3.0"?

kimixa · 2026-04-20T19:32:52 1776713572

Yeah, the banners/popups aren't required by gdpr, they're the "malicious compliance" solution site owners came up with because they don't want to comply with the limitations, and make it as difficult as possible for the user not to let them.

kimixa · 2026-04-19T18:19:03 1776622743

As someone that's worked on GPU drivers for shared memory systems for over 15 years, supporting hardware that was put on the market over 20 years ago, and they've "always" (in my experience) been able to dynamically assign memory pages to the GPU.

The "reserved" memory is more about the guaranteed minimum to allow the thing to actually light up, and sometimes specific hardware blocks had more limited requirements (e.g. the display block might require contiguous physical addresses, or the MMU data/page tables themselves) so we would reserve a chunk to ensure they can actually be allocated with those requirements. But they tended to be a small proportion of the total "GPU Memory used".

Sure, sharing the virtual address space is less well supported, but the total amount of memory the GPU can use is flexible at runtime.

kimixa · 2026-04-18T20:24:23 1776543863

Even the latest CPUs have a 2:1 fp64:fp32 performance ratio - plus the effects of 2x the data size in cache and bandwidth use mean you can often get greater than a 2x difference.

If you're in a numeric heavy use case that's a massive difference. It's not some outdated "Ancient Lore" that causes languages that care about performance to default to fp32 :P

pixelesque · 2026-04-18T21:30:05 1776547805

> Even the latest CPUs have a 2:1 fp64:fp32 performance ratio

Not completely - for basic operations (and ignoring byte size for things like cache hit ratios and memory bandwidth) if you look at (say Agner Fog's optimisation PDFs of instruction latency) the basic SSE/AVX latency for basic add/sub/mult/div (yes, even divides these days), the latency between float and double is almost always the same on the most recent AMD/Intel CPUs (and normally execution ports can do both now).

Where it differs is gather/scatter and some shuffle instructions (larger size to work on), and maths routines like transcendentals - sqrt(), sin(), etc, where the backing algorithms (whether on the processor in some cases or in libm or equivalent) obviously have to do more work (often more iterations of refinement) to calculate the value to greater precision for f64.

omoikane · 2026-04-18T23:50:00 1776556200

> the latency between float and double is almost always the same on the most recent AMD/Intel CPUs

If you are developing for ARM, some systems have hardware support for FP32 but use software emulation for FP64, with noticeable performance difference.

https://gcc.godbolt.org/z/7155YKTrK

kimixa · 2026-04-18T23:09:27 1776553767

> ... if you look at (say Agner Fog's optimisation PDFs of instruction latency) ...

That.... doesn't seem true? At least for most architectures I looked at?

While true the latency for ADDPS and ADDPD are the same latency, using the zen4 example at least, the double variant only calculates 4 fp64 values compared to the single-precision's 8 fp32. Which was my point? If each double precision instruction processes a smaller number of inputs, it needs to be lower latency to keep the same operation rate.

And DIV also has a significntly lower throughput for fp32 vs fp64 on zen4, 5clk/op vs 3, while also processing half the values?

Sure, if you're doing scalar fp32/fp64 instructions it's not much of a difference (though DIV still has a lower throughput) - but then you're already leaving significant peak flops on the table I'm not sure it's a particularly useful comparison. It's just the truism of "if you're not performance limited you don't need to think about performance" - which has always been the case.

So yes, they do at least have a 2:1 difference in throughput on zen4 - even higher for DIV.

pixelesque · 2026-04-19T10:19:05 1776593945

Well, maybe not all admittedly, and I didn't look at AVX2/512, but it looks like `_mm_div_ps` and `_mm_div_pd` are identical for divide, at the 4-wide level for the basics.

Obviously, the wider you go, the more constrained you are on infrastructure and how many ports there are.

My point was more it's very often the expensive transcendentals where the performance difference is felt between f32 and f64.

adgjlsfhk1 · 2026-04-19T00:01:22 1776556882

This depends largely on your operations. There is lots of performance critical code that doesn't vectorize smoothly, and for those operations, 64 bit is just as fast.

kimixa · 2026-04-19T02:15:54 1776564954

Yes, if you're not FP ALU limited (which is likely the case if not vectorized), or data cache/bandwidth/thermally limited from the increased cost of fp64, then it doesn't matter - but as I said that's true for every performance aspect that "doesn't matter".

That doesn't mean that there are no situations where it does matter today - which is what I feel is implied by calling it "Ancient".

adgjlsfhk1 · 2026-04-18T21:07:34 1776546454

> languages that care about performance to default to fp32

What do you mean by this? In C 1.0 is a double.

kimixa · 2026-04-18T23:16:26 1776554186

But the "float" typename is generally fp32 - if we assume the "most generically named type" is the "default". Though this is a bit of an inconsistency with C - the type name "double" surely implies it's double the expected baseline while, as you mentioned, constants and much of libm default to 'double'.

adrian_b · 2026-04-20T12:22:34 1776687754

The C keywords "float" and "double" are based on the tradition established a decade earlier by IBM System/360 of calling FP32 as "single-precision" and FP64 as "double-precision".

This IBM convention has been inherited by the IBM programming languages FORTRAN IV and PL/I and from these 2 languages it has spread everywhere.

The C language has taken several keywords and operators from IBM PL/I, which was one of the three main inspiration sources for C (which were CPL/BCPL, PL/I and ALGOL 68).

So "float" and "double" are really inherited by C from PL/I.

A feature that is specific to C is that it has changed the default format for constants and for intermediate values to double-precision, instead of the single-precision that was the default in earlier programming languages.

This was done with the intention of protecting naive users from making mistakes, because if you compute with FP32 it is very easy to obtain erroneous results, unless you analyze very carefully the propagation of errors. Except in applications where errors matter very little, e.g. graphics and ML/AI, the use of FP32 is more suitable for experts, while bigger formats are recommended for normal users.