Pretty much. Llama specific: https://github.com/qwopqwop200/GPTQ-for-LLaMa > Acc... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		recuter on March 17, 2023 \| parent \| context \| favorite \| on: TextSynth Server Pretty much. Llama specific: https://github.com/qwopqwop200/GPTQ-for-LLaMa > According to GPTQ paper, As the size of the model increases, the difference in performance between FP16 and GPTQ decreases. https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i... https://docs.google.com/document/d/1wZ0g9rHI-6s7ctNlykuK4W5T... Expect to get away with a factor of 4-5 reduction in memory usage for a minimal loss of quality. :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact