More

wmobit · 2026-01-05T13:03:25 1767618205

The _fltused handling is quite crude: https://github.com/llvm/llvm-project/blob/5cfd02f44a43a2e2a0...

TLDR it's going to emit the reference if it's targeting MSVC and there's any float typed reference. You'd need to do something to avoid it, other than avoiding float types.

Presumably you are compiling for the MSVC ABI. Trying to plug your own runtime that doesn't behave exactly as the MSVC scheme isn't going to just work out of the box. The compiler has to know the details of the ABI you are targeting, if you're doing your own thing the compiler would need to treat that as a separate ABI. I'm not sure there's a triple now that means MSVC-like-freestanding.

The same logic applies on the clang driver side. The ABI expects to link those libraries, so it will.

wmobit · 2026-01-05T13:10:00 1767618600

Possibly -ffreestanding will help

wmobit · 2025-11-14T23:36:12 1763163372

I'd go so far as to say it's the exact opposite. It's faster and easier to change the hardware than the software.

elteto · 2025-11-15T02:39:41 1763174381

Counterproof: attempt to modify your graphics card. Then attempt to modify a piece of code. Which one was easier?

Mehvix · 2025-11-15T07:03:16 1763190196

You're saying it like hardware and software are disjoint. You design hardware with software in mind (and vice versa); you need to if you want performance rivaling nvidia. This codesign, seeing their products are not only usable but actually tailored to maximize resource utilization in real workloads (not driven by w/e benchmarks), is where AMD seems to lack.

Why oversimplify the premise and frame your take as some 'proof'. Just use the term counter-argument/example

wmobit · on June 22, 2024

clang does have pragma clang fp to enable a subset of fast math flags within a scope

wmobit · on May 19, 2019

This couldn't have a worse name. It's already used inside clang, and llvm. Searching "llvm CodeGen" will never find this.

wmobit · on Nov 25, 2018

Chrome is available on the App Store

swsieber · on Nov 25, 2018

And is Safari based on iOS

wmobit · on Sept 9, 2015

It's really not. It's barely an abstraction over LLVM IR

wmobit · on Nov 23, 2014

I live on El Camino and frequently take the bus. It's a 40 minute walk to the nearest caltrain station

wmobit · on Oct 12, 2014

The keurig actually does have tea pods, which produce pretty awful tea

likeclockwork · on Oct 12, 2014

"He had found a Nutri-Matic machine which had provided him with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea."

wmobit · on Sept 26, 2014

There isn't really anything fundamentally that would make CUDA faster that OpenCL. There aren't any huge semantic differences between them.

liuliu · on Sept 26, 2014

The computing model, no, not really anything fundamentally different. It comes to tooling and profiling under Linux. Also, NVidia has slightly beefer cores and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof, nvvp) and libraries (cublas, cudnn, cufft).

wmobit · on Sept 26, 2014

There is an OpenCL profiler for AMD, and library equivalents for those in clBLAS / clFFT

seanmcdirmid · on Sept 27, 2014

Is there a a comparable of cuBLAS for OpenCL?

wmobit · on Sept 27, 2014

clBLAS

wmobit · on March 24, 2014

No, unfortunately front ends still need to be aware of some of the ABI details of the target to produce the IR for it.