Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Emerging Reasoning with Reinforcement Learning Is Both Effective and Efficient (hkust-nlp.notion.site)
4 points by greenflag on Jan 25, 2025 | hide | past | favorite | 1 comment


They replicate the DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data. We show that long Chain-of-Thought (CoT) and self-reflection can emerge on a 7B model with only 8K MATH examples, and they achieve surprisingly strong results on complex mathematical reasoning. They fully open-source their training code and details to the community to inspire more works on reasoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: