Hacker Newsnew | past | comments | ask | show | jobs | submit | wmobit's commentslogin

The _fltused handling is quite crude: https://github.com/llvm/llvm-project/blob/5cfd02f44a43a2e2a0...

TLDR it's going to emit the reference if it's targeting MSVC and there's any float typed reference. You'd need to do something to avoid it, other than avoiding float types.

Presumably you are compiling for the MSVC ABI. Trying to plug your own runtime that doesn't behave exactly as the MSVC scheme isn't going to just work out of the box. The compiler has to know the details of the ABI you are targeting, if you're doing your own thing the compiler would need to treat that as a separate ABI. I'm not sure there's a triple now that means MSVC-like-freestanding.

The same logic applies on the clang driver side. The ABI expects to link those libraries, so it will.


Possibly -ffreestanding will help


I'd go so far as to say it's the exact opposite. It's faster and easier to change the hardware than the software.


Counterproof: attempt to modify your graphics card. Then attempt to modify a piece of code. Which one was easier?


You're saying it like hardware and software are disjoint. You design hardware with software in mind (and vice versa); you need to if you want performance rivaling nvidia. This codesign, seeing their products are not only usable but actually tailored to maximize resource utilization in real workloads (not driven by w/e benchmarks), is where AMD seems to lack.

Why oversimplify the premise and frame your take as some 'proof'. Just use the term counter-argument/example


clang does have pragma clang fp to enable a subset of fast math flags within a scope


This couldn't have a worse name. It's already used inside clang, and llvm. Searching "llvm CodeGen" will never find this.


Chrome is available on the App Store


And is Safari based on iOS


It's really not. It's barely an abstraction over LLVM IR


I live on El Camino and frequently take the bus. It's a 40 minute walk to the nearest caltrain station


The keurig actually does have tea pods, which produce pretty awful tea


"He had found a Nutri-Matic machine which had provided him with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea."


There isn't really anything fundamentally that would make CUDA faster that OpenCL. There aren't any huge semantic differences between them.


The computing model, no, not really anything fundamentally different. It comes to tooling and profiling under Linux. Also, NVidia has slightly beefer cores and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof, nvvp) and libraries (cublas, cudnn, cufft).


There is an OpenCL profiler for AMD, and library equivalents for those in clBLAS / clFFT


Is there a a comparable of cuBLAS for OpenCL?


clBLAS


No, unfortunately front ends still need to be aware of some of the ABI details of the target to produce the IR for it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: