Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tested it on basic long addition problems. It frequently misplaced the decimal signs, used unnecessary reasoning tokens (like restating previously done steps) and overall seemed only marginally more reliable than the base DeepSeek 1.5B.

On my own pet eval, writing a fast Fibonacci algorithm in Scheme, it actually performed much worse. It took a much longer tangent before arriving at fast doubling algorithm, but then completely forgot how to even write S-expressions, proceeding to instead imagine Scheme uses a Python-like syntax while babbling about tail recursion.



> On my own pet eval, writing a fast Fibonacci algorithm in Scheme,

This model was trained on math problems datasets only, it seems. It makes sense that it's not any better at programming.


The original model, aside from its programming mistakes, also misremembered the doubling formula. I hoped to see that solved, which it was, as well as maybe a more general performance boost from recovering some distillation loss.


This model can't code at all.

It does high school math homework, plus maybe some easy physics. And it does them surprisingly well. Outside of that, it fails every test prompt in my set.

It's a pure specialist model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: