Hacker Newsnew | past | comments | ask | show | jobs | submit | aomix's commentslogin

Talking the problem to death with the AI before implementation is a nice zone for me. I feel productive, get good results out of the AI, and still largely understand the code. That’s the part of the AI revolution that I feel has made me a better engineer because I argue about design and architecture all day with a robot.

I follow the same process. I have a design in mind for the problem at hand, but I don't reveal it to Codex. I go back and forth a bit to see if its proposals are better than mine. I go back and forth on tradeoffs of various approaches. And then I ask it to compare its proposals with mine. I "win" most of the time but there are many times where it shows a me a better, or simpler approach, or makes me rethink the solution altogether.

Once this is done, the mechanical coding parts are mostly routine (for codex)


I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.


LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness

> LLMs flip positions when users push back

Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.


I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.

An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.

Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.


It's a good thing except when it's not. The problem is the AI does not understand when to use which approach.

Contrast this with a human. We generally understand when the other person knows what they're doing and we should just listen, and when the other person is asking for an honest opinion and wants a push back if necessary.


Skills help there.

I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.

And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.

They have different prios, but they both push back on feedback, till you convince them.


Care to share the skill behind the Linus reviewer ? I tend to as it to do that but leave it up to LLM to decide what the means. Interested to see any specifics you might have included there if it’s ok to share.

Sure.

Would be interested in the experience others may have, took me weeks of iterations to get reviews in a format and utility I liked.

https://gist.github.com/enricopolanski/2bde8619f53307c9bcd5e...


I agree completely. Skills definitely keep it in line and sticking to the script. Thanks for sharing the skills you use, I’ll definitely take a look.

I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.

I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.

They do still attenuate their latent space on prior conversations turns as authority. That is why I like pure design/review sessions and pure coding sessions, often at the same time. I can often keep design and review in the critic and review role without becoming a sycophant. Coding agent just picks up dispatches and works with very little opinion at all.

Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.

Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.

No affiliation.


It's actually a reasonable way to think about alignment. Sometimes you want the agent to just listen to you and sometimes you want the agent to think critically.

I think about this line a lot. For example, as it happens sometimes you'll have a typo in something you want the agent to do. Llms typically will correct that typo silently and implement the actually intended thing. But if you said, "no, I want the thing I typed," I think everyone's expectation is that is says, "ok done."

I've found that leaving clues in the system prompt / exchange that are open to critique largely mitigate sycophancy with most recent models.

As engineers were trained to represent our positions strongly. Strong opinions loosely held, etc. when you speak authoritatively to a person, "I think we should do x...", the person understand that that's just you're opinion and have the autonomy to push back.

An llm imo _shouldnt_ have that same kind of autonomy by default and it should be rlhf'ed out.


Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.

Obviously this is just my experience. Claude code pushes back much harder than Codex.

I have totally opposite experience.

Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything

LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.

Makes sense as priming is at the core of how an LLM is trained.

“Given these words, predict the next word.”


It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.

Yes, outside of coding too, it’s a good idea to ask open ended questions rather than ask for confirmation, to avoid this sycophantic bias

There's an easy workaround that helps instead of listing options, just describe the problem constraints and ask it to propose approaches independently.

> I go back and forth a bit to see if its proposals are better than mine

I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less


I had the exact experience yesterday.

I have a performance problem and went down the path of optimising part of a pipeline that when benchmarked was not the bottleneck, even if it looked plausible for me and the llm. When I asked it to make a final benchmark for documentation I found most of the work I did improved 30% while another path would have improved a magnitude more.

Thankfully iteration is now faster than ever and given how fast it creates tests, previous tests created for the aborted optimisation were helpful.


> Turns out AI is terrible at guessing whats faster or allocates less

s/AI/a human being/ would work equally well, lol.

Jokes aside, I do like the approach of letting the AI build something deterministic and make decisions based on that.


Yup, just like people!

I think this approach is more common than the hype for actual work. I do something similar, many back and forth, then settle on something often with now known tradeoffs, written by hand to spot issues as a final guard/ keep consistent naming etc.

i bet you've contributed a lot of training trajectories for those AI's.

Good!

Despite the cynical sibling reply, I also feel like there's real value here. Contrary to the meme, I don't think Claude just tells me I'm brilliant, but really does push back on directions that are unproductive, helps identify when a part is overcomplicated or a dependency has become redundant, etc. Those are important things to have at least a sightline on before getting too deep into the code, even (or maybe especially) in a world where an awful lot of code can be created basically for free.

I'm usually the one spotting redundancies and dead branches in Claude's code, not the other way around. But I think either way, what's important is questioning the process and understanding the way the code is working so that you retain a full mental model.

>> and still largely understand the code [...] ,that, I feel has made me a better engineer

the cynic in me would say that a good engineer should fully understand the code you write.

I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.

How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?

A.I makes it easier to get yourself into problems early on.


> How can you reason about code you don't fully understand?

We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.


Yes, this is why bugs get often worked around instead of being fixed properly.

One strategy I use in the planning phase is even when I know how I'd implement the solution, I ask the Claude/Codex how they would solve the problem or implement the feature without giving them any clues - and then compare their solutions to my own. Often I am pleasantly surprised by alternative ways of doing things and ideas that we integrate into the final design.

Same. I've been creating "research" documents where I let it do a freeform survey of possible solutions/have sketch out it's own solution. I'll then sketch out a plan based on what I think is good or what I think it missed, and then I'll have it interrogate me for a final PRD document. It then implements the feature in reviewable chunks, and I'll give it feedback or tweak the PRD doc as needed.

Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.


Same here. Step 1 is usually a research doc where I simply describe the task and tell it to research the relevant parts of the codebase. This gets refined to a high-level plan, which gets distilled to a detailed step-by-step implementation plan.

When it comes to the actual implementation I prefer to work through it in small steps, where the AI explains to me exactly what it's about to do and why (and I approve) along the way. This enables me to catch it if it's about to do something I disagree with beforehand. And reduces the time I need to spend reviewing in the end.


I like this, though it does leave me feeling more nervous when I really don't know how I'd solve the problem, still requires trust.

How would you approach this problem if you are let's say token constrained due to per month limits set in your company?

What I've tried to do is make the bot write detailed spec documents, slowly building it over time as I explain the full problem.

It works for the most part but it's you have some non standard requirement, the agent seems to skip over that part of the spec document when it starts to code. Or it would have needless checks for situations that I said will never happen


In my book, the single most effective way to spend tokens is having it review code/specs you've written. One advantage to putting the ai in that position is that unreliable competence isn't much of a problem as you can ignore bad suggestions.

I would also recommend explaining the specs and doing a lot of your back and forth with a lower end model and set it to a higher end model only once the conversation history has all the context you feel the higher end model needs.


As the post says, after an agent implements the plan, have another agent review it. Make sure to mention it must ensure the plan is fully executed. It works wonders!

>I argue about design and architecture all day with a robot.

You will outgrow it at some point.



Yes, this is the way I do stuff.

Try and learn at every point.


I think this is OK though. We can still micromanage[0] the code generation part for a useful productivity boost, I think.

[0] At least, in my experience, "micromanaging" the AI is what gives me the best results. Iterating on the initial design, then iterating on the plan, then reviewing the proposed code changes (including tests), then getting an independent code review from another LLM, etc. If you give an LLM too much latitude that's when the really shitty code and ill-considered breaking changes/obliteration of existing functionality starts to creep in.


I feel like there's an overly negative vibe to this response when it just seems like rubber duck debugging - I would assume the user isn't trying to argue like how you might have to argue specs, but is merely trying to clarify their own ideas and learn possible alternatives.

Quite the opposite. It’ll most likely “outgrow” us.

Can't, it ain't nothing BUT us.

You can wait and see, but that's what'll happen. If we stop it stops.


Sure, not "us". But it could possibly outgrow the vast majority of individual "yous".

nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

I was more annoyed than anything that I didn't hit this moment until my 40s.

Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.


What you guys don't understand is that you don't argue with people or robots to teach them. You argue to teach yourself. Until you get out of that mindset, indeed a lot of conversation will seem useless, be it people or robots.

>You argue to teach yourself.

Oh. I am aware. It is not that deep. But who you argues with still matter. There was a point where I have abandoned Reddit and HN. I came back to HN because people here also seem to have grown up. Reddit stays mostly the same.

I credit the moderation here for that, I mean allowing people to grow out of the echo chamber.


It does to an extent. One thing I will give AI, because of the nature of LLMs, you are essentially arguing with the median level of the input that trained the model. So, for someone new to the subject, you get access to patterns that will bring them up to a certain level.

Getting past that is problem we face now.


That may well need more than the models, somehow put it better than me: these LLMs have no taste - nor can they as thins are.

>nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

Yes, I thought the same as well because that was the same line of thought that made me write my comment.

>Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.

Yea, they are like a slingshot. You need to let go at some point or else it will drag you back.


Its like that phase people go through where they argue with morons on reddit, and then one day grow up and realize that most of these people are unemployed/underemployed terminally online nobodies aren't ever going to learn anything, and even if they did it wouldn't impact the world since they were just some below average hobbyist anyway and aren't in charge of anything more important than a box of paperclips.

Ah, if it’s a robot in charge of the paperclips you need to watch out a bit.

Mostly with you, though in recent years I have wondered whether those people are part of what caused the latest boom of political populism. If there is no one there to debate the problematic ideas, problematic ideas will become the rhetoric after all.

That might be true on general-population social media, but the opposite is the case in niche groups, and in particular, this very industry we're in - software - was largely built on terminally online hobbyists.

I also like doing this exact thing. I really don't like using any AI-powered IDEs but AI is still too useful, what I do is just open up a Claude or Gemini chat, explain the project, and start talking about implementations, feature additions, and how systems should be structured. Most of the time, as long as you dont let the AI be too biased towards your answers, it'll give actually good answers that help immensely for the project.

From the other end, I've seen this go wrong a couple ways:

When I'm doing it: I can go on way too long trying to consider way too much, when really, putting down some code and reading it and writing it myself would give me a better understanding.

When others are doing it: they can get very entrenched in a certain way of thinking, and are sure it's correct because of their AI conversations. Some context or data point was missing from their conversations with the AI.


This.

This is what I tell people (including non-programmers interested in vibe coding), the results you get are product of... process. Formal process.

From this naturally emerges the other thing I tell people: domain expertise (or at least, familiarity and or capacity for learning) is still determinate of outcome.

I don't touch the code. But I do push back on expedience, laziness, inconsistency, and all the other recurring unsolved problems of generated code... and continue to play whack-a-mole in pursuit of process that whacks the moles.


I think that many AIs nowadays have similar process incorporated in their thinking blocks, you can see there how it discuss implementation details with itself - so such discussion happen even in case human does not participate in the loop.

I agree with this take. But this take also means that actual productive token use is not as high as people currently make it out to be.

AI is an excellent rubber duck and test writer. Maybe I sniff my farts too much but I like my code just the way I want it lol


Yeah, me too. I argue with multiple models at the same time via a markdown doc to coordinate the discussion. I feel like it makes me less anxious about the final output if nothing else.

Yet, so many internet users seem to only understand "hand crafted" vs "vibe coded" as if there wasn't tons of middle grounds and different uses.

Yeah I feel like a rubber ducking with some feedback has been very helpful

The professionalization of rubber ducking. I like it.

I think this is honestly the #1 best use case for AI in development. If you use it right it can be exactly the annoying junior who questions every decision you make that you need.

It started looking a whole lot like OpenBSD’s random number system. Private entropy pool from good system entropy seeds a ChaCha20 stream with random reseeds for forward secrecy in case of compromise. I think Linux is even more paranoid in the early boot environment where even in the presence of a seed file it prefers to get system entropy mixed in before confidently saying it can do crypto activities.


The Wire's eternal wisdom "You start to follow the money, you don't know where the fuck it's gonna take you"


How about the Iran-Contra connections? The crack epidemic ravaging Baltimore in The Wire has everything to do with Jeffrey Epstein. Ever wonder how he bought that island of his?

https://forum.agora-dialogue.com/2025/12/19/epstein-israel-a...

https://youtube.com/shorts/Z3gHFmdYZ_E?si=KPtbGy9j_whzKX_n

https://nsarchive2.gwu.edu/NSAEBB/NSAEBB2/index.html


I haven't had a chance to do embedded work but people damn near fall to their knees and weep when talking about how nice the experience is using embassy. Which makes me want to give it a try.


I find it interesting that the same process that played out in the forums to feed transition also took place in video games with dedicated servers to matchmaking. I joined a random Counter Strike server in 2005 and ended up becoming close friends with regulars on the server and I'm still in touch with some of them this day.


My wife has been on the same minecraft server for 15 years. We meet up with the other members fairly regularly; a few of them even flew out to Hawaii with us for vacation last year. This year we're going to Canada for vacation, and we'll probably have a group of 7-8 of the Canadian members meet up and go do stuff.


as a long time FPS gamer on PC, the change to the matchmaking style feels like it coincided with the rise of consoles and crossplay. it feels odd playing Battlefield with console players who can’t use the game chat cause they don’t have a keyboard.


I’m not even an athlete but I hike fairly often and walk 3-4 miles a day with my dog. If I get a pair of so-so boots they’ll last less than a year. I got recommended boots by a friend that lasted me 6 years until I finally had to replace them with an identical pair this Christmas. Nothing has been an issue in the two weeks I’ve had them but when companies get gutted by PE their quality goes down sharply. If I have to find a new boot company I’ll be very sad.


I keep believing there’s a web of trust type future for social media but I can’t articulate it.


"I would have written a shorter letter, but did not have the time." is my favorite quote for that


I want Netflix to lose. After living with their binge release schedule for however long now I think we're all worse off for it. So I want less of the industry to use it.


You are not forced to buy their product, or to buy into their schedule.


You can only vote with your feet if you can step somewhere else. We are watching locations for your feet to go shrink in real time.


You don't need the streaming service though, you can just do without or find other methods of obtaining their content. It's not like food, electricity, or water where you may have no actual options or very limited options. Movies and shows are wants, not needs, and people can walk away and fill the time some other way.


Saying everyone should just quit streaming and go touch grass or read a book is not a productive recommendation. It's been tried for decades and fails because people really like TV and Movies. Given that, the discussion here needs to start from the assumption that people will continue to watch TV and movies and suffer meaningful quality of life impacts when they do not.


Once Netflix buys all of these companies, you won't ever be able to watch a WB movie without a $25 netflix sub per month. (and yeah, when they are done buying all the competition that's what the monthly will be.


> Once Netflix buys all of these companies, you won't ever be able to watch a WB movie without a $25 netflix sub per month. (and yeah, when they are done buying all the competition that's what the monthly will be.

That's kind of a silly argument. "People are better off paying $100+/month for 4+ streaming services than $25/month for one that has everything."

If your argument were that you'd have to pay more than the current combined cost, it'd be a better argument against mergers. Arguing against something because it's a better deal is just strange.


It's not that silly of an argument when you factor in Blu-Ray as the other side of "won't be able to watch a WB movie without". Right now the only Netflix "Exclusives" you can find on Blu-Ray are the ones they source from Sony, Warner Brothers, or Paramount. If they own Warner Brothers one of those Blu-Ray sources goes away.

Instead of a one-time Blu-Ray purchase for ~$25 for a movie to watch as many times as you'd like, it's an ongoing subscription for $25/month. If you only want to watch that one movie in two different calendar months, you've easily doubled your spend.

(Yes, it is still apples-to-oranges because you may watch more than one movie in a month, but the flipside is that the $25/month is a variable catalog fee. The movie you want to watch may be "vaulted" that second month you want to go watch it. With Blu-Ray you control your film catalog, with Netflix some finance team does.)

(Also, yes, easy to forget Blu-Ray in this debate because Blu-Ray is dying/dead, especially in physical retail with Target and Best Buy dropping its sections. You can also substitute a lot of the same arguments here with arguments for Movies Anywhere and/or iTunes Store.)


thats not how most people do streaming, they consume everything on netflix - when the content gets stale, they cancel, move to P+, consume for a few months, stale, d+, stale, A+, etc.... 1 at a time


That's what some people do, the average household (per polling) has 4+ video service subscriptions.


So essentially less than the cost of two tickets to see a movie in theaters today. The horror.


Subscriptions add up + you will see ads and have to pay for "premium" content.


It will be $50 soon enough if this goes through


I never watched the widescreen version of The Wire they put out years ago but now I'm curious again. That show was a bone deep 4:3 product and the show plays with it constantly. Here is an interesting breakdown that made me really appreciate how clever they got while trying to be pretty subdued with the cinematography on The Wire https://vimeo.com/39768998


I watched both; both are good, in different ways. Some scenes that I remember being beautifully composed in 4:3 lost to the transition, while others have improved markedly.

They made a ton of effort on it, recognising it's a different version altogether:

> The new version of The Wire, then, will differ both creatively and technically. In certain cases, such as a scene in season two where longshoreman gather around a body, Simon said he believed the added space would add a vulnerability to the scene that wasn’t possible in 4:3. But he describes other scenes where the added space distracts the eye, and the remaster zooms in on the characters to retain that intimacy.

https://www.techhive.com/article/599415/hbo-remastered-the-w...


David Simon's earlier work, "Homicide" had a lot of interesting switching between film + video and aspect ratios as well. I think it's something that he's been interested in for a long time.


David Simon wrote the source material for Homicide but did not "make" the show, he cut his teeth in TV writing on Homicide.

I have Homicide on DVD, it also has some good extras.


It was recently remastered. I watched in the original 4:3 but I'm happy that some love has been put into restoring the show, albeit in unintended 16:9.


https://vimeo.com/39768998

> This video is not rated. Join vimeo to watch

No thanks. Here's a YouTube mirror: https://www.youtube.com/watch?v=ufs0Rwx8sOk



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: