ONNX doesn't support the same level of quantization as GGML. So basically GGML w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ianpurton on June 13, 2023 \| parent \| context \| favorite \| on: Llama.cpp: Full CUDA GPU Acceleration ONNX doesn't support the same level of quantization as GGML. So basically GGML will run on hardware with less memory.

regularfry on June 13, 2023 [–]

Or alternatively, bigger models with the same memory (just quantised harder).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact