Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Only for chat sessions, not for agentic coding. It's just too slow to be practical (10 minutes to answer a simple question about a 2k LoC project - and that's with a 5070 addon card).


This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.

I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.


Doesn't the framework desktop have a Ryzen 395 AI? That's a unified memory architecture like the Macs.


Ah, forgot to add, it's not really "unified" you have to explicitly specify your allocations. You may have a reasonably good 48gb chunk assigned to the GPU, but that DDR5 is 5-10 times slower than GDDR/HBM and the GPU itself isn't stellar.

So, framework laptops are great for chatting but nearly useless in agentic coding.

My Radeon W7900 answers a question ("what is this project") in 2 minutes, it takes my Framework 16 with 5070 addon around 11 minutes without the addon - around 23 (qwen 3.5 27b, claude code)


That's discrete DDR5, it's not as fast as your regular VRAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: