Hacker Newsnew | past | comments | ask | show | jobs | submit | jdlshore's commentslogin

For those who aren’t aware, as I wasn’t, this is coming from the “VP of Community” at the Zig Foundation. So the proposal to soft ban LLMs at Zig Day meetups seems like it has a bit more weight than if it was some random community member.

(I’m not a member of the community, so not fully aware of the dynamics.)


I had the pleasure to talk to Loris about this topic just last week and I've liked his take very much.

His point is not coming from a place of LLM demonization. He very much acknowledges their usefulness, especially in a business context, e.g. for implementing yet another standard CRUD application and for shipping all the other "average" (in quality) business features quickly.

His point is a different one entirely: Say Andrew Kelley is attending the Zig Day. Why would you ask an LLM about a Zig programming problem you're struggling with instead of learning from the man himself? There's simply no LLM as knowledgeable about Zig as Andrew and the other people working on it or with it on the daily.

In other words: Zig Days are an opportunity for people to learn from each other and to spend time together (= the "Community" in "VP of Community"), and LLM are diminishing this opportunity.

Besides, Zig itself is mainly a language for people who care not just that a problem is solved but also about how it's being solved. ("Create software you can love.") While LLMs don't prevent anyone from doing so, they make it much more appealing to just vibe-code everything and not look too closely at the implementation.


Being negative against the utility of AI is sometimes besides the point or a strawman (don’t know about this instance). Only an idiot would argue that cars don’t have utility in cities and countrysides that are already paved over. Or that they are slower. Or that the train can get you from Bob’s to Burger Establishment with less walking. But they may want other things like more walkable cities and less mass extinction in a hundred years time.

No one needs to proclaim the utility of The Car before criticizing car culture.

This piece already says that all the clanker maximalism may be correct. shrugs Then it says that this get-together is for people who like programming. Even if the whirlwind of progress comes and takes their profession. Because then it could still be a hobby.

And this is too negative-against-AI for some people in this thread? Programming as a hobby? Okay, fine. Maybe we will have sold off all our RAM in a years time and the Government will have outlawed unassisted programming as too dangerous. The piece is too optimistic.


This didn't sound like a "soft ban" to me.

With things happening in general, and with Bun's LLM-aided move away from Zig in particular, there is bound to be some interest in talking about LLMs and how that impacts Zig's future.

I think this was a well measured "hey, let's focus on thing we are coming together to celebrate and advance: Zig".


Isn't "soft ban" still a bit harsh? I think it was reasonable take and a good reminder of the purpose of these events.

Only in the sense that community leadership is harsh by nature. ‘We’d rather you didn’t do that’ is the essence of a soft ban and what they’ve expressed as leader here, and even a soft ban muffling topics and therefore a subset of people, which is often seen as ‘harsh’.

> Isn't "soft ban" still a bit harsh?

Not necessarily. The take is reasonable but I'm curious about who could be bold enough to actually talk about or disclose their use of LLMs during these events.


The whole reason for this blog post is because discussion about LLMs does happen already (to the point of being a bit suffocating).

see also https://news.ycombinator.com/item?id=48314145


It's funny (or sad) how I read this as an address about pro-LLM speech when "talking about LLMs" doesn't have to be just that.

I appreciate the clarification.


People are looking at this all wrong. What you have here is a 13 year old business that makes about $100M per year. Nothing to sneeze at, but when compared to successful live-service games and MMOs, it’s peanuts. The cumulative revenue is irrelevant, and again, peanuts compared to successful games. Genshin Impact made $100M per month when it was released. World of Warcraft is a few decades old and still made $680M in 2024 (the most recent numbers I could find).

What makes Star Citizen unique is that it was crowd-funded—it started taking money before any features were available at all, and released to buyers (“backers”) in an unusually incomplete and buggy state. But all that revenue is turned around and plowed right back into development. (Their finances are publicly available.)

The game remains incomplete and buggy to this day, but with lots of stuff to do and clear forward progress, and, from what I hear, it offers a uniquely immersive experience. People like it enough that revenue keeps increasing, with nearly every year setting new records.

So that’s all this is: a moderately successful company with big ambitions, a lot of bugs, and a passionate fan base. Its annual revenue is nothing special. Don’t fall the sensationalism of the way its modest lifetime revenue numbers are shared.

(Just an interested observer with no horse in this race. I’ve never played the game or spent any money on it.)


“Our systematic study exposes a phenomenon of constraint decay in LLM-based coding agents. While current models excel at unconstrained generation, their performance drops when forced to navigate explicit architectural rules. For end-users, this dichotomy implies that agents are reliable for rapid prototyping but remain unreliable for production-grade backend development.”

One major weakness of this study is that they didn’t fully test frontier models for cost reasons, so the specific performance results should be taken with a grain of salt. But the overall conclusion that models degrade when both behavior and architecture must be correct is interesting, and something to keep an eye on.


I think it's downstream of "you can't optimize for two different objectives".

If you only have functional requirements, then in effect you're doing some form of program synthesis, and RL can optimize that very hard.

If you have a mixture of functional and non-functional requirements, you are basically giving the model an incomplete specification, and it must in some way guess at the user's intent to fill in the blanks. This is also why adding to the prompt examples of the style of code you want (hats off to antirez for this particular tip ;)) is phenomenally powerful.


> ... This is also why adding to the prompt examples of the style of code you want ...

You could take it a step further and put the example code into source code files...and be like, super comprehensive with your examples ... ;)


Well yes, ideally. But real world codebases aren't clean enough to be used as the example ideal. Styles change over time, there are always code migrations and refactors in flight, legacy code exists, etc. Using specific examples of what you expect the LLM (and humans) to do now is necessary.

Would you mind sharing antirez' suggestion?

I am obviously paraphrasing, but the general idea is that trying to synthesize style from a codebase into e.g. a markdown guide generally doesn't work very well. What achieves style transfer is providing the model with a lot of examples of the style, conventions, patterns you want.

To put it in practice: if you point claude/codex to a repository and you ask it to implement feature X using style guide Y, the code will probably work, but you can usually get better results by saying "do it in the style of this file, it was done well there".


Right more simply put it's great at being a copy cat, exploring similar data points that match your token needs.

It is not great at decision making or judgment calls that don't have a well defined spec or plan in place yet; like unofficial or unapproved tokens if you will. A lot of this stuff simply never has had specs as it has been internal to how companies work and their secret sauce.

The closest thing we have are governance and compliance policies due to legal/business needs requiring it so it's far more well documented than operational ones in how we work. It is more about the how versus the what here I guess is what I'm saying.

But yeah this is why it does great when there are tests, design systems, evals, and other artifacts to mirror. Far more reckless and unpredictable without these things, but still great for exploration and finding the data output you seek.


Doesn't that make sense? Its text prediction. If you give it examples, it can predict. Synthesizing "put semi-colons on new lines" requires it to generate its own examples 'in its head' (so to speak) and remember that. It won't.

It's like when I see people feeding it a whole bunch of "best practices" and expect it to follow them. It won't. But you could ask it questions about the best practices all day long.


Yes, exactly. Any engineer deep on this stuff right now understands that grounded predictive engine sprinkled with RL training and are discovering what that means in terms of its strengths and weaknesses for company use.

Supposing an unspecified or poorly specified function f(x), and example "f(A)=>B", "given C tell me what f(C) is" lies at the core of creativity.

Idk, calling it "just text prediction " seems unfairly dismissive of this capability


Saying that it’s dismissive is like saying writing (insert language) is dismissive that you’re just writing assembly.

at the end of the day, it presents a vector field and predicts the next vector. That’s literally the heart of intelligence just like assembly is the heart of execution. When playing table tennis, your brain is literally predicting seconds into the future to get your body into the right position.

But we aren’t discussing intelligence here. We are discussing how best to utilize that intelligence.


You're making my point for me, saying table tennis is "just a proprioceptive predictor" is dismissively reductive (and not a particularly useful framework for understanding table tennis), even if it is strictly speaking accurate. It's the sort of thing someone who has no idea how hard training for table tennis is would say.

Let me put it bluntly. I’m agreeing with you but saying that isn’t what I was talking about and trying to give examples. You’re also agreeing with me.

The “idea” of table tennis and the rules. Those are things we can talk about. It’s those “best practices” I gave in my example. The actual playing of table tennis would be the examples. How to apply those best practices and what good code looks like.


I ran into similar issues as we started to roll out LLM generated financials in our org.. I’m so used to the old SQL workflow of “grab this data from this table, that data from that table, combine it into a final result that looks like xxxx” where the tables were outputs from reports in our ERP but I was having terrible results.

Ended up pointing Claude at a few sample files from our existing reporting, gave it read-only oauth access to the ERP and said “build a new report showing the cash by project as calculated by xxxx - yyyy + zzzz in the style of the existing reports” and it basically one-shot from there.

Kind of crazy and I built a bunch of redundant check-sums because I honestly didn’t think it would be able to replace like 6 workdays of effort for the 2 FTEs who generate that kind of thing manually every month but so far so good..


I was recently using Copilot to implement a small feature within a very large codebase. About 75-80% of the time, the code that was added matched the current style (warts and all). Copilot would specifically go off and research "How X is already done in the codebase" all the time.

You basically get this for free, if the coding agent has read the relevant classes that the legacy code its touching has to match.

just dont break out a plan without also having it read the code again


I've noticed something similar with AI assist authored books as well. Early on it does alright, but after some chapters the beginning of each chapter repeats the end of the previous, and obvious LLM tells become more frequent.

The more it has to go on, the more it relies on repetition of what came before. It's also possible that authors start paying much less attention and put less effort into editing later chapters.

Despite the sheer volume on Amazon, LLMs are not at the point of writing well.


Holy crap are you reading books that advertised somehow they were written with LLM assistance? Hard no here in 2026.

Oh no, they were not advertised as such. It's rather painfully obvious in the worst cases.

That may be the same problem seen when prompts try to force "alignment" or "guardrails". There's a performance drop. Seemingly, a big chunk of the potential solution space has been made unreachable.

For example, if you apply "guardrails" to an image generator of about a year ago, all the people start looking alike. Story generators start using only a few standard names.

That was last year. Is it happening with the frontier models?


Hmm, I have some anecdotal evidence this is true. Interactively working out a plan with Opus on multiple occasions it'd come up with an incompatible solution, I'll add additional context/requirements, and it has a tendency to "anchor" on it's original architecture and struggles to adapt. Sometimes it tries to sneak in changes for the original plan anyway.

I think the problem is they take the shortest path to the goal ...which may or may not coincide with what you have planned. Oh, and generally think instructions are merely suggestions and what you really want this this totally different thing and not the one in the plan you handed them plus, as a stoke of good luck, this other system is a lot easier to implement as well.

I mean, I spend more tokens having them clean up all the places they didn't follow the the plan (if I catch it) or implementing what came out of a 'complete and tested' previous plan where they just stop as soon as all the pathetic new test pass and you discover half of it isn't even there when trying to implement the next thing on top of it.

Though... I have been conducting an experiment, of sorts, where we've been cooking on these fairly complicated projects and I don't ever touch a single line of code, just yell at them a lot, and with suitable amounts of marijuana (they are very frustrating most of the time) it's been going pretty well. I also helps that they need to explain what they're doing to somebody fairly-baked -- maybe not such an HR friendly plan?


Opus does this waaaay too much for my taste. It works fine for vibe-coders but for technical work it is infuriating.

Even the strongest frontier model they used - GPT 5.2 - I would consider barely usable for agentic programming.

I’m not really interested in analysis of the weaknesses of such models because in my experience many weaknesses disappear entirely as models get stronger and reasoning effort is turned up. Especially if you tell them what you want them to do.

Also, it’s not surprising to learn that when more acceptance criteria are added the failure rate increases.


Oldheads remember when GPT 5.2 was at the forefront of agentic programming. December 2025 feels like eons ago, but alack it was an entire half year!

If I'm not using got 5.5 high reasoning I'm wasting time.

Well, maybe so, but how did you feel about 5.2 when it was OpenAI's frontier model? That's what I'm getting at – it was the equivalent of your gpt 5.5 high reasoning just six months ago.

They all feel the same to me now, opus, 5.5, whatever

It was a joke. I think you need to mix up models.

Gotcha. Hard to parse tone and intent through text on the internet.

Wait isn't gpt 5.2 good? Or is it not thinking / not codex? 5.2 was what sparked the late 2025 openai agentic programming revolution.

5.2 still had a Codex variant, which this doesn't describe using. It also notably is not using the Codex harness -- it does everything with open source harnesses (which obviously are worse). And while it uses two harnesses with its cheap models, it only uses the worse-performing one of those with GPT 5.2 for cost reasons. (They also don't specify effort/thinking level used for GPT 5.2, but given that it performs worse in their baseline testing than obviously non-SOTA models, I'm guessing it wasn't set to anything high.)

> their performance drops when forced to navigate explicit architectural rules

Even the best models have trouble adhering to stuff as mundane as rules for how to style generated code (indent this much, name things with these patterns, etc.). Even the most die-hard AI-first coder will admit to that kind of stuff being not unheard-of. Yet they still delude themselves into thinking that these models will follow a sufficiently detailed spec to the letter, every time.


> (and will be held accountable)

Is this just a rhetorical flourish? I’m not up on the details, but it seems like Musk just screwed things up and walked away scot-free. What path do you see for him actually being held accountable for the damage he caused?


In 2029, there will be a new AG who I hope will make a firm commitment to prosecute Musk and other Trump officials for their crimes. I won't vote for anyone who doesn't promise to extract some level of accountability, although I could imagine being persuaded by an argument that letting Musk skate will allow us to ensure that someone else gets the sentence they deserve.

I believe modern EVs have battery heating built in for exactly that reason.

Which uses the energy from the battery, so if they did that, you better not go out of town for a couple of weeks or you'll come home to a dead car.

You just keep it plugged in when you are gone, what’s the issue?

This reminds me that in Yakutsk, you put your car in a big sock while it’s parked and the car will occasionally start on its own to keep the block from freezing (they don’t have plugs outside, so no block warmers, no EVs). If you leave your car parked long enough, you’ll run out of gas and your engine will probably be hosed.


Are you sure you aren’t experiencing selection bias? The article only mentions one modeling error (the one you quoted), so “all the errors” must be the ones you’ve noticed elsewhere.


The lack of shipping through the Strait of Hormuz says otherwise. If Iran can successfully blockade the Strait in the face of the full military might of the US, certainly they can cut an undersea cable.


What prevents a consortium of network operators using similar hostage politics on Iran?

What prevents them from pooling together the funds for 2 decades of relentless cyberattacks on Iran, unless twice the same fee is payed in reverse to compensate for just the threat?

(currently unaffected network operators have an incentive to chip in, lest political factions local to or neighboring their cables start imitating Iran)

(unlike conventional warfare, cyberattacks can be highly directed to regime players, elites, etc. so targeting a network operator seems like the dumbest move one could make: conventional warfare can sometimes generate new supporters for the regime, hitting the elites or regime elements much less so)


> What prevents them from pooling together the funds for 2 decades of relentless cyberattacks on Iran, unless twice the same fee is payed in reverse to compensate for just the threat?

I thought it was clear to everyone that this is _exactly_ what the U.S. and Israel have been doing to Iran for literally 20+ years [0] [1]. In addition to economic warfare, other types of espionage, and acts of terrorism - e.g. blowing up a bunch of people's mobile devices and pagers (in before someone accuse the children and civilians harmed in the attack of being terrorists themselves).

Bad actors have been extracting all kinds of concessions and actions out of Iran for many decades now under all kinds of threats, tactics, and attacks.

This war was literally started by the U.S. and Israel blowing up peace talks in Iran.

[0] https://en.wikipedia.org/wiki/Stuxnet [1] https://en.wikipedia.org/wiki/Cyberwarfare_and_Iran


> I thought it was clear to everyone that this is _exactly_ what the U.S. and Israel have been doing to Iran for literally 20+ years [0] [1].

You changed my proposition to a different one by equating

US & Israeli cyber warfare, with

US & Israeli & worldwide telecom sector cyber warfare.

I ask why risk that step? worldwide telecom sector is highly networked (by profession obviously) and is probably already picking up phones and coordinating a common response, together they stand, divided they fall, none of them look forward to potential normalization of nation states charging fees unilaterally.

Its an error to confuse big problems with even bigger ones than they already face.

Telecom sector might collectively demand public payment of twice the threatened fee (however small the actual demanded fee is) plus a public statement by Iran that they publically repeal the threat just to make clear this type of precedent won't be tolerated.


Or they would just pay because telecom companies do not have armies.


this makes no sense, you're saying that if some cable operator in some distant nation agrees to chip in to prevent such normalization worldwide, that Iran will come over to that nation to physically overpower the cable operator who doesn't have an army?


Small businesses are bigger than you think they are. A company with $100 million revenue per year could still be a small business.

You might be assuming small businesses have less than ten people. That’s a category of small business called a “micro-business” or microenterprise, depending on funding model.


Had to look it up, but instagram had 13 employees when they sold to Facebook for $1 billion (for some reason I remembered them being 9 people). I know multiple gale devs who had single digit (or low double digits) staff when they were already making many millions in revenue/profit.


Different countries use different definitions of what "small business" or "micro business" is. And people usually use their own local expectations they're used to. I'm not from the US and a company with 100 million revenue is far from a small business to me.

In EU where I'm from the micro/small/medium business sizes are tied to both employee count AND revenue. Micro is below 10 employees and below 2 million € revenue, Small is below 50 employees and below 10 million € revenue, Medium is below 250 employees and 50 million € revenue.

So if you had 100 million revenue you would be a large business even if you had less than ten people.


I see where you’re coming from, but something about this framing bothers me.

I think acting honorably has to come from within. It’s something that people need to do regardless of rewards or incentives. Now, how we create a culture that actually does so… that has to come from society. But, imo, if people only act honorably because they’re rewarded for it, and they don’t when no one is looking… that’s not acting honorably at all.


You can both be right. I live in a high trust society (Japan), but was raised elsewhere. When I first came here, there were times I had to suppress my instinct to take opportunistic advantage. That was intrinsic motivation.

Later, I had adapted to the culture around me. Such instincts rarely arise as it had become extrinsic.


My pessimistic take is that the majority of the population will simply never do that. Look at organized religion. One of its key promises is “behave in life and you’ll get everything you could ever dream of in the afterlife”. I don’t think it’s coincidental.


Neither you or parent have proven your position. Parent made a “conventional wisdom” statement without providing data; you shared three examples, and as the saying goes, the plural of “anecdote” is not “data.”



Now that’s what I’m talking about. Thank you.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: