Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If those tools are writing the code then in general I do expect that to be included in the PR! Through my whole career I've seen PRs where people noted that code that was generated (people have been generating code since long before LLMs). It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself (which in my experience is the case where it's obvious boilerplate or the generated section is small).

Needing to flag nontrivial code as generated was standard practice for my whole career.



> It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

If this is not the case you should not be sending it to public repos for review at all. It is rude and insulting to expect the people maintaining these repos to review code that nobody bothered to read.


Sometimes code generation is a useful tool, and maybe people have read and reviewed the generator.

The difference here is that the generator is a non-deterministic LLM and you can't reason about its output the same way.


As a rule, I commit the input to the code generation tool, i.e., what the GPL refers to as "the preferred form of the work for making modifications to it", generate as part of the build process, and, where possible, try to avoid code generation tools designed around the assumption that its output will be maintained rather than regenerated from modified input.

As for LLM code assistants, I don't really view them as traditional code generation tools in the first place, as in practice they more resemble something in between autocomplete and delegating to a junior programmer.

As for attribution, I view it more or less the same way as "dictated but not read" in written correspondance, i.e., an disclaimer for errors in the code, which may be considered rude in some contexts, and a perfectly acceptable and useful annotation in others.


"Here's what AI came up with and it mostly worked the one time I tested it. Might need improving".

No. I don't want to test and pick through your shitty LLM generated code. If I wanted the entire code base to be junk, it'd say so in the readme.


Usually, pre-LLM generated code is flagged because people aren't expected to modify it by hand. If you find a bug and track it to the generated code, you are expected to fix the sources and re-generate.

This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.

That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.


> If those tools are writing the code then in general I do expect that to be included in the PR!

How about compiler?


Compilers don't usually write the code that ends up in a PR. But compilers do (and should) generally leave behind some metadata in the end result saying what tools were used, see for example the .comment section in ELF binaries.


Are you checking in compiled artifacts? Then yeah, we should have a chain of where that binary blob came from.


Do you check in binaries into your git history? If so, you should mark a commit as generated, and the commit message (plus repository state) should be enough to recreate it 1:1.

Similarly, if I use e.g. jextract or uniffi to generate Java interfaces from C code and check that in, I'll create tooling to automatically run those, and the commit will be attributed to that tooling.


Compiler versions are usually included in the package manifest. Generally you include commit info compiler version and compilation date and platform embedded in the binaries that compilers produce.


Absolutely. Let's say I have a problem with gRPC and traced it to code generated using the gRPC compiler. I can reproduce it, highlight it and I'm pretty sure the gRPC team would address the issue.

Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?

Probably not.


>It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

I thought the argument was that AI-users were reviewing and understanding all of the code?


> people have been generating code since long before LLMs

How? LSTM?


There are many techniques. You're most likely to come across things like declarative DSL:s and macros, then there are things like JAXB and similar tooling that generates code from data schemas, and some people script around data sources to glue boilerplate and so on.

Arguably snippet collections belong to this genre.



For example `rails generate ...` built into the Rails CLI.


See, for example, this blog post from 2014: https://go.dev/blog/generate

The following comment in the blog post

    //go:generate stringer -type=Pill
generates a .._string.go file which contains a '.String()' method.

I would find it very reasonable to commit that with 'Co-Authored-By: stringer v0.1.0' or such.

Or 'sed s/a/b/g' and 'Co-Authored-By: sed'


Holy shit I’m old.


You assemble all your machine code using a magnetized needle?


I am not against the general use of AI code. Quite simply, my view is that all relevant context for a review should be disclosed in the PR.

AI and humans are not the same as authors of PRs. As an obvious example: one of the important functions of the PR process is to teach the writer about how to code in this project but LLMs fundamentally don't learn the same way as humans so there's a meaningful difference in context between humans and AIs.

If a human takes the care to really understand and assume authorship of the PR then it's not really an issue (and if they do, they could easily modify the Claude messages to remove "generated by Claude" notes manually) but instead it seems that Claude is just hiding relevant context from the reviewer. PRs without relevant context are always frustrating.


What's really tricky with the legal protections area is this: 90% of the value of the S&P 500 is intangible. Meaning if you suck out the book value (10%), the rest is brand, IP, rights, sources & methods, etc. So if a company can't protect that, it's not particularly valuable anymore. Maybe we will see a shift back to tangible assets and book value (25,000 $8MM Vera Rubin machines) and away from intangibles...


I think this is just the beginning so people are apprehensive, rightfully so, at this stage. I agree with you that AI use should be disclosed but using the commit message as a billboard for Anthropic hell no. Go put an add on the free tier.


You don't generally commit compiled code to your VCS. If you do need to commit a binary for whatever reason, yeah it makes sense to explain how the binary was generated.


You do usually pin your compiler version though, or at the very least set a minimum version


Don't be silly.

I use good ol' C-x M-c M-butterfly.

https://xkcd.com/378/


Sometimes using AI to code feels closer to a Butterfly than emacs right?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: