Hacker Newsnew | past | comments | ask | show | jobs | submit | msdz's commentslogin

Such an increase tracks the company's valuation trend, which they constantly, somehow have to justify (let alone break even on costs).

"Writing is nature's way of letting you know how sloppy your thinking is." – Dick Guindon

If your text hasn't undergone that process, it's still sloppy thinking.


> I look at the starts when choosing dependencies, it's a first filter for sure.

Unfortunately I still look at them, too, out of habit: The project or repo's star count _was_ a first filter in the past, and we must keep in mind it no longer is.

> Good reminder that everything gets gamed given the incentives.

Also known as Goodhart's law [1]: "When a measure becomes a target, it ceases to be a good measure".

Essentially, VCs screwed this one up for the rest of us, I think?

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law


> The project or repo's star count _was_ a first filter in the past, and we must keep in mind it no longer is.

Id suggest the first question to ask is "if the project is an AI project or not?" If it is, dont pay attention to the stars - if it's not, use the stars as a first filter. That's the way I analyse projects on Github now.


> The project or repo's star count _was_ a first filter in the past, a

I agree that it has been a first filter, but should it ever have been? A star only says that someone had a passing interest in a project. Not significantly different from a 'like' on a social media post.


This got me thinking, and it might actually even be a comparable amount. Let's estimate 12 years of schooling run at minimum $100,000 per student, at least in the US [1], and then add onto that number whatever else you may do after that, i.e. a bunch more money if paid (college) or "unpaid" (self-taught skills and improvements) education, and then the likely biggest portion for white-collar workers, yet hard-to-quantify, in experience and "value" professional work will equip one with.

Now divide the average SOTA LLM's training cost (or a guess, since these numbers aren't always published as far as I'm aware) by the number of users, or if you wanted to be more strict, the number of people it's proven to be useful for (what else would training be for), and it might not be so far off anymore?

Of course, whether it makes sense to divide and spread out the LLMs' costs across users in order to calculate an "average utility" is debatable.

[1] https://www.publicschoolreview.com/average-spending-student-...


Personally I've heard Odin [1] to do a decent job with this, at least from what I've superficially learned about its stdlib and included modules as an "outsider" (not a regular user). It appears to have things like support for e.g. image file formats built-in, and new things are somewhat liberally getting added to core if they prove practically useful, since there isn't a package manager in the traditional sense. Here's a blog post by the language author literally named "Package Managers are Evil" [2]

(Please do correct me if this is wrong, again, I don't have the experience myself.)

[1] https://pkg.odin-lang.org/

[2] https://www.gingerbill.org/article/2025/09/08/package-manage...


The difference is that the work a contracted tradesperson will do is typically under some sort of guarantee, e.g. typically 2 years on work done in your home (up to 5 for bigger construction etc. type work), at least here in Germany… which you don’t (need to) factor in when DIY-ing.


> that they're unable to [manage and] kill child processes they themselves spawn makes it seem like they have zero clue about what they're doing.

Yeah, at the bare minimum these projects could also use something like portless[1] which literally maps ports to human- (and language model-)readable, named .localhost URLs. Which _should_ heavily alleviate assignment of processes to projects and vice versa, since at that point, hard-to-remember port numbers completely leave the equation. You could even imagine prefixing them if you've got that much going on for the ultimate "overview", like project1-db.localhost, project1-dev.localhost, etc.

[1] https://port1355.dev/


Well, or just use port 0 like we've done for decades, read what port got used, then use that. No more port collisions ever. I thought most people were already aware of that by now, but judging from that project even existing, seems I was wrong.


That’s a little different, right? Using port 0 would imply that clients have not hard coded what port they should connect to and also we don’t mind having duplicate processes occupying other ports which are no longer on active use


Felt an instant urge to nuke your comment if I could. Excellent work.


Interesting article you’ve linked. I’m not sure I agree, but it was a good read and food for thought in any case.

Work is still being done on how to bulletproof input “sanitization”. Research like [1] is what I love to discover, because it’s genuinely promising. If you can formally separate out the “decider” from the “parser” unit (in this case, by running two models), together with a small allowlisted set of tool calls, it might just be possible to get around the injection risks.

[1] Google DeepMind: Defeating Prompt Injections by Design. https://arxiv.org/abs/2503.18813


Sanitization isn’t enough. We need a way to separate code and data (not just to sanitize out instructions from data) that is deterministic. If there’s a “decide whether this input is code or data” model in the mix, you’ve already lost: that model can make a bad call, be influenced or tricked, and then you’re hosed.

At a fundamental level, having two contexts as suggested by some of the research in this area isn’t enough; errors or bad LLM judgement can still leak things back and forth between them. We need something like an SQL driver’s injection prevention: when you use it correctly, code/data confusion cannot occur since the two types of information are processed separately at the protocol level.


The linked article isn't describing a form of input sanitization, it's a complete separation between trusted and untrusted contexts. The trusted model has no access to untrusted input, and the untrusted model has no access to tools.

Simon Willison has a good explainer on CaMeL: https://simonwillison.net/2025/Apr/11/camel/


That’s still only as good as the ability of the trusted model to delineate instructions from data. The untrusted model will inevitably be compromised so as to pass bad data to the trusted model.

I have significant doubt that a P-LLM (as in the camel paper) operating a programming-language-like instruction set with “really good checks” is sufficient to avoid this issue. If it were, the P-LLM could be replaced with a deterministic tool call.


They’d probably get the farthest, but they won’t pursue that because they don’t want to end up leaking the original data from training. It is possible in regular language/text subsets of models to reconstruct massive consecutive parts of the training data [1], so it ought to be possible for their internal code, too.

[1] https://arxiv.org/abs/2601.02671


Copyright for me not for thee? :) That's a good point though. Maybe they could round trip things? E.g., use the model trained only on internal content to generate training data (which you could probably do some kind of screening to remove anything you don't want leaking) and then train a new model off just that?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: