Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been using llama.cpp with the python wrappers and it's the speed increase has been great, but it seemed to be limited to a max of 40 N_GPU_LAYERS. Going to have to update and see what sort of improvement I see.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: