I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.
Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.
The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.
Yeah... I would definitely call 2t/s unusable. For simple chats, I'd want at least 15 t/s. For agentic coding (which this model is advertised for), I'd want good prefill performance as well.
That's just throwing money away. The performance with large context would have been unusable especially if you need to serve more then a single person.
what makes you think that china ever gave up its communist goals? I personally see that everything they do aims towards that goal. From the one child policy, the huge amounts of empty apartments they build, the stuff they produce for almost free, the fishing.. open sourcing the models perfectly fits that culture too, it's the means of production
The one-child policy died a long time ago. Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.
There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier. It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
One-child policy did not die, it just morphed into Three-child policy, still a form of family planning, and still would probably fine people for having more than three kids.
true, but as far as I understand it did because birth rates got too low. so they replaced it with a two-child policy and later with a three-child policy
> Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.
Yeah, I am sure there's a lot of cases for that. But as far as I know the amount of billionaires has started declining in China, and I don't see how that means that they as a country moved away from the goal, it just means there's issues
> There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier.
I don't know about that, you could be right. A google search for real estate prices in china reveal a lot of news articles how they are going down though.
> It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
Wouldn't be surprised if rich people in china invest in real estate. They don't have free capital flow, so its not easy to invest abroad and it becomes an obvious choice. Bitcoin is banned in China for that reason too
But again, as far as I know that does not mean the country moved their goals of trying to reach communism one day
> I don't see how that means that they as a country moved away from the goal, it just means there's issues
They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.
> A google search for real estate prices in china reveal a lot of news articles how they are going down though.
They're investing outside China (Vancouver, Toronto, NYC, London, Sydney, Melbourne, etc.) because their assets are safer there (these countries all have strong property protection laws). Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.
However, that's not my point - I did not mean to say that they are going to be successful but rather that it still appears to be a long term goal for them.
> Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.
I don't know about that, without any source of data I guess I just have to take your word for it. I would not be surprised if you were right in this case though.
China is a ruthless capitalist country managed by an authoritarian regime. Planning and lack of respect for the individual or the rule of law are not communist per se.
I dont think thats right, the models and the gpus are the means of production.
in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so
The reason regular-capitalism worked is that all production used to depend on workers bottlenecking the free flow of capital by demanding salaries in exchange for their labor. Now that we've removed that obstacle, capitalism demands workers seize the means of production in order to maintain the status quo. Hence, supercapitalism.
regular capitalism works but now that the means of production are not factories, the workers have to become more entrepreneurial. Then they will control their destinies.
You miss the point: we advertise the change as workers becoming part of the owner class and realizing all of the economic gains of their work, thus supercapitalism. Don't use the "s" or "c" words.
Yup I've mentioned this in another thread, I got gpt 5.4xhigh to improve the throughout of a very complex non typical CUDA kernel by 20x. This was through a combination of architecture changes and then do low level optimizations, it did the profiling all by itself. I was extremely impressed.
I was using codex cli with 5.4xhigh. So it was able to iteratively improve from simple prompts on my part (can you give some architectural ideas to improve the performance? And once it does, I just say can you implement and benchmark it).
I think it was a bit like Karpathy's autoresearch, except I was doing manual promoting... Though I feel I could definitely be removed from that equation.
Hmm, I still have some nonrefundable API credits with OpenAI. Maybe I should try to use them for my kernel.
FWIW, this talk[1] from NVIDIA/Meta from March claims that coding agents can often write correct implementations of of CUDA kernels, but that they're usually dog slow, like 100x slower than a kernel optimized by a skilled human.
For me it was able to try out different architectures for perf improvement, then once it's settled on some good architectures, it can do lower level optimizations on them by profiling the code etc.
That's great. It seems like a large body of experts are having real problem with this, so maybe you should publish something about your methods, or start a business...
I can't vouch for whether or not it can beat human experts though because I'm no CUDA expert myself. The original CUDA code were human written and I first let codex adapt it to my specific use case. Then I basically let codex generate ideas and try the ideas out itself (I think it's a bit like Karpathy's autoresearch, except I was still doing manual prompting). And that was enough to get me 20x improvement.
I suspect when people said AI wrote non performant CUDA kernels it was beginning-mid last year and it's definitely vastly improved since back then. And the agent's ability to iteratively improve really impressed me.
But is arc-agi really that useful though? Nowadays it seems to me that it's just another benchmark that needs to be specifically trained for. Maybe the Chinese models just didn't focus on it as much.
Is it though? Do we still have the expectation that LLMs will eventually be able to solve problems they haven't seen before? Or do we just want the most accurate auto complete at the cheapest price at this point?
It indicates that there's a good chance that they have trained on the test set, making the eval scores useless. Even if you have given up on the dream of generalization entirely, you can't meaningfully compare models which have trained on test to those which have not.
That is ture, but the revenue of the artisanal stuff is probably only a very low percentage of the overall market, which would imply a lot of software engineers would have to exit the field. Which is what we here don't want to see.
Seems like the high compute parallel thinking models weren't even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn't solve it.
reply