Are you referring to https://github.com/oriansj/stage0? This stage0 gets you a basic C compiler. An impressive achievement for sure. I don’t see how it really solves anything though.
More generally each language would need it’s own path from this basic assembler to a compiler implementing that language in C which doesn’t necessarily exist. While initial versions of the Rust compiler were written in C, more recent versions are self hosted and rely on the previous version. This goes for projects like LLVM too since it requires a c++14 compiler to start with.
A better approach is to figure out how to cross compile stage0 and how to make sure the cross compilation step can be trusted. Going the approach of trying to build a “trusted” path from source is noble but IMO ultimately futile.
This is what I mean when I say this as an unsolved problem. It may be getting better but there are fundamental unsolved challenges to this exploration and very little guarantee the task can be accomplished in the first place (ie no mathematical theory that might present a path to follow rather than just trying to accomplish it through sheer effort). Some small part of the problem might get solved but that’s very different from saying that any package will be verifiable in this way.
>More generally each language would need it’s own path from this basic assembler to a compiler implementing that language in C which doesn’t necessarily exist. While initial versions of the Rust compiler were written in C, more recent versions are self hosted and rely on the previous version. This goes for projects like LLVM too since it requires a c++14 compiler to start with.
...and there's a bootstrap path for the bottom of that graph all the way from Guix's (fairly minimal but not quite stage0 yet) bootstrap binaries. LLVM too; since you mentioned it and I was curious, here's the dependency graph of LLVM 9.0.1 on my system, in GraphViz and PDF formats:
(That's from `guix graph -t bag-with-origins llvm@9.0.1` on my laptop.)
It's my understanding that being bootstrappable in this way is a requirement for being included in upstream Guix, so generally everything that's available in Guix can be bootstrapped like this.
Now, what's not done yet is the path from stage0 to building the bootstrap binaries, but that's the eventual goal. The rest of it? It's not futile, it's done. Not all software is in Guix, but enough of it is that I'm typing this from a laptop running almost exclusively software from Guix.
I’m saying stage0 is a massive incomplete lift. If you notice that Rust chain starts at g++ - can g++ be built from the basic assembler you listed?
Beyond that, as you can tell from your graph there’s an enormous amount of code that’s part of a build from a large amount of projects. Explode that by a factor of 100000x to get the number of lines of code. So even once you’ve proven the source matches the binary, you are still trusting that the source code is of a trustworthy nature in the first place. You can move the trust required around and in some cases reduce it, but ultimately you’re trusting a lot of people and code (heck you’re trusting the good will of general OSS to validate the source itself isn’t malicious).
Like I said, I’m supportive of the effort and guix/nix are great projects and extremely valuable. There’s no theoretical basis though to back the design of solving the trust issue though (the best we have are reputation systems but even that isn’t backed by any mathematical model).
>I’m saying stage0 is a massive incomplete lift. If you notice that Rust chain starts at g++ - can g++ be built from the basic assembler you listed?
Not yet, but gcc can be built starting from Mes and a few bootstrap binaries (xz, bash, tar, etc): https://bootstrappable.org/projects/mes.html - and that initial version of gcc is 2.95.3, which does have C++ support.
...and there's work being done towards building Mes with M2-Planet (another C compiler by the same author as stage0): https://www.gnu.org/software/mes/
(Although I think the Guix project has shifted to an approach that starts with Scheme instead these days; they have a Scheme implementation of most of those bootstrap binaries now in the form of Gash and Gash Core Utils: https://guix.gnu.org/en/blog/2020/guix-further-reduces-boots....)
In any case, if you can connect the two, and it feels like that's very close at this point, you've got a bootstrap path from that basic assembler to gcc, and from there to... well, the rest of the distro.
>Beyond that, as you can tell from your graph there’s an enormous amount of code that’s part of a build from a large amount of projects. Explode that by a factor of 100000x to get the number of lines of code. So even once you’ve proven the source matches the binary, you are still trusting that the source code is of a trustworthy nature in the first place. You can move the trust required around and in some cases reduce it, but ultimately you’re trusting a lot of people and code (heck you’re trusting the good will of general OSS to validate the source itself isn’t malicious).
Of course, yes, this is a problem. At least once you're able to bootstrap entirely from source you can be pretty confident that the source code you're looking at is what's in the binaries on disk. But yes, that doesn't solve the problem of having a ton of code to review if you need to review it.
More generally each language would need it’s own path from this basic assembler to a compiler implementing that language in C which doesn’t necessarily exist. While initial versions of the Rust compiler were written in C, more recent versions are self hosted and rely on the previous version. This goes for projects like LLVM too since it requires a c++14 compiler to start with.
A better approach is to figure out how to cross compile stage0 and how to make sure the cross compilation step can be trusted. Going the approach of trying to build a “trusted” path from source is noble but IMO ultimately futile.
This is what I mean when I say this as an unsolved problem. It may be getting better but there are fundamental unsolved challenges to this exploration and very little guarantee the task can be accomplished in the first place (ie no mathematical theory that might present a path to follow rather than just trying to accomplish it through sheer effort). Some small part of the problem might get solved but that’s very different from saying that any package will be verifiable in this way.