More

pu_pe · 2026-05-27T08:50:26 1779871826

Even for that function, I think their benefits are overrated. Someone could easily manipulate that number if it fits their purposes (ie a health tech startup trying to peddle some cure). The numbers given to you are an illusion of understanding that does not replace actual research.

Retr0id · 2026-05-27T09:31:39 1779874299

I don't think it's as easy to manipulate as you say.

pu_pe · 2026-05-27T10:00:04 1779876004

Smaller markets are easily swayed by not so much money. Arbitrage is difficult because there is an inherent threat that you're betting against insider information.

In this specific case it's clear that the market doesn't represent our collective knowledge about hantavirus at all. Twenty days ago the odds of a hantavirus pandemic being declared by the WHO were ~10% and now it's 5%. This is just a reflection of the news cycle, it would be nuts to say that this is a reliable estimate for the underlying scientific facts.

pu_pe · 2026-05-27T08:39:32 1779871172

The problem with measuring AI productivity is that the people doing the actual job (paralegals, developers, etc.) are doing it for someone else (judges, managers, etc.). More work, or even a speedup does not actually benefit them. So when you give professionals a tool that speeds them up, they increase their slack and/or focus on other, less productive activities rather than work more.

The article captures this too, mentioning a couple of examples of startups where presumably this feedback loop is tighter.

pu_pe · 2026-05-20T21:37:16 1779313036

148 mentions of "rocket". 773 mentions of " AI ".

ggreer · 2026-05-20T22:24:44 1779315884

That's because they use other terms like "Falcon 9/Heavy", "Starship", "Super Heavy", "launch vehicle/system", "booster", "upper/lower stage", and "spacecraft".

pu_pe · 2026-05-18T13:59:13 1779112753

> And most importantly, you don't care at all if the tool you vibe-coded is any good. If you write at tool that converts an image to black & white, you are the kind of person who doesn't know or care what KIND of black&white it is. The fact that there are many algorithms to choose from would never cross your mind.

Why do you believe that? What if I care so much about black and white and need a specific algorithm, isn't it much better to do that through a tool I can control rather than any proprietary one?

lionkor · 2026-05-19T08:37:21 1779179841

... because there are open source tools that implement these things properly? Take Darktable, if you try to build an alternative because you "just want to do a bit of color grading" you will produce garbage.

My point is that AI bros are forgetting that the LLM is only as good as the person using it. If you don't sufficiently understand the problem you're solving, your solution will be terrible. If you believe that AI will simply take that work away from you and do it itself, you should ask your favorite LLM to explain how LLMs work.

theshrike79 · 2026-05-20T11:13:46 1779275626

But what if I _only_ need to convert images to black and white, why would I install a full ass Darktable setup for that?

TeMPOraL · 2026-05-19T09:09:06 1779181746

> ... because there are open source tools that implement these things properly? Take Darktable, if you try to build an alternative because you "just want to do a bit of color grading" you will produce garbage.

... only if you don't pay attention. The existence and popularity of Darktable literally means that LLMs have a reference implementation on-hand and probably seen some of it in training too. If for some reason they don't immediately go looking at it, then you can always ask.

> If you don't sufficiently understand the problem you're solving, your solution will be terrible.

Fortunately the LLM is able to help you with first understanding the problem space, too. Just don't expect it today to zero-shot things you don't understand well enough.

lionkor · 2026-05-19T15:09:59 1779203399

I agree with you, and my issue is exactly that seemingly the popular opinion here is that you CAN zero-shot things you don't understand.

pu_pe · 2026-05-18T11:10:40 1779102640

> There are levels in this work. Level 1 is the typing. Syntax, semicolons, the years memorizing pointer arithmetic and which header file the function lives in. Level 2 is the verifying. The harness. The test suite. The reflex of rejecting the ninety attempts that almost work and shipping the one that does. Level 3 is the deciding. What to build at all. Which architecture survives contact with the real world. (...)

> AI lowered the cost of Level 1. It did not touch Levels 2 or 3.

I feel like AI is firmly taking over Level 2 too. But more importantly, is that why no one vibecoded Photoshop? I think it's probably because it took >1 million man-hours to build it, so even at a 100x speedup it's not a very attractive project.

falcor84 · 2026-05-18T11:32:18 1779103938

Agreed. Very firmly in level 2, and going well into level 3 too.

For example, I've had quite a few situations when I asked Claude Code to do some manual work for me, assuming it would do manual edits across several files, but it instead decided to write a script, and even a small test suite for it. It's a small encroachment so far, but I don't see any limitation for it gradually taking on actual product decisions.

pu_pe · 2026-05-18T07:52:56 1779090776

Mistral's claim to European sovereignty is somewhat weak, because:

- They are partially owned by American companies

- They run their infra on American providers (AWS especially)

- Their models are a distillation and therefore dependent of American and Chinese models

pu_pe · 2026-05-18T07:15:34 1779088534

> uses 98% fewer tokens than grep

So are we supposed to believe that grep is so wasteful that models are reading 98% useless garbage every time they call it? Either this claim is not representative, or you're missing something else when you throw away the vast majority of context for the model.

Bibabomas · 2026-05-18T09:20:33 1779096033

The 98% is vs the grep+read loop, not grep output alone. When an agent hits an unfamiliar codebase it typically does "cat file" or reads the whole thing first, at least in my experience. If you're reliably getting agents to do "grep -C N" and stop there I'd genuinely be curious what your setup looks like, because I think the quality of the results is just too low to serve as useful context.

ac29 · 2026-05-18T15:45:55 1779119155

> When an agent hits an unfamiliar codebase it typically does "cat file" or reads the whole thing first, at least in my experience.

Depends on the size of the project and specific files. I have definitely seen agents make smart use of pi's "read" tool, which can take an offset and line limit (or defaults to a max 2000 lines/50KiB if the model doesn't specify). The bash tool also has the same max output, so if a model decides to cat instead of using the read tool it still wont blow out its context window with a single large file read.

But this sort of thing is going to vary with harness, model, project, and whatever the RNG delivers for the day.

pas · 2026-05-18T07:50:05 1779090605

I had problems with Claude reading hundreds of kilobytes of outputs because grep found things in node_modules. (ripgrep helps, so it makes sense to add a line about it into some memory file.)

boyter · 2026-05-18T07:24:00 1779089040

Grep prints out every matching line. For some searches a LLM might do it will get a lot of noise, and it might have to make that search because it cannot be specific. Targeted search can reduce the number of tokens.

I suspect this comparison is against reading the whole codebase though compared to just getting the bits you need.

pu_pe · 2026-05-17T13:14:08 1779023648

Some organizations added a ton of process around software development because it is expensive and risky. They require a ton of approvals and sign-offs, then some managing overhead on top to check if their investment is on the right track. This approval process is bound to change by the fact that development is far cheaper and faster now.

Another aspect that is not captured here is that the lawyers and subject matter experts will also be using AI to speed up their parts.

pu_pe · 2026-05-15T11:25:03 1778844303

The problem is if those GPUs are running on an AWS server (or any other American provider), even if it the server is in the UK the sovereignty claim is null and void.

benjamintnorris · 2026-05-15T12:22:43 1778847763

Civo isn’t a US company and isn’t a subsidiary of one. UK-incorporated since 2018, UK-resident leadership, no US parent, entirely UK founder owned. We also share the concern on the Azure/AWS “sovereign cloud”, but this doesn't apply here. The CLOUD Act reaches data held by US entities regardless of where the servers sit, which is exactly why those offerings are problematic. We’re outside that jurisdiction by corporate structure, not just geography.

StilesCrisis · 2026-05-15T11:37:06 1778845026

Doesn't "We built it on fully UK sovereign cloud infrastructure, so data never leaves UK jurisdiction" cover that?

pu_pe · 2026-05-15T11:47:22 1778845642

In theory it should, but I've seen that language describing Azure "sovereign cloud" servers before. The data might indeed be stored in the UK, the problem is the CLOUD act which supersedes it.

pu_pe · 2026-05-15T11:19:40 1778843980

Risks:

Changing parameters on the insulin pump because the LLM said so

Neglecting to seek actual medical advice believing a LLM replaces it

Misunderstanding medical complexity (ie a prescription due to medical history not available to the LLM)

brookst · 2026-05-15T19:21:56 1778872916

Are those different from the risks of reading about diabetes on Reddit or using Google?