Gccgo did use threads for each goroutine for a while, and it was significantly slower than the gc "green thread" style implementation (used by both compilers today). Goroutines weigh in at about 4k each. OS threads are 64k minimum. That allows more than an order of magnitude more goroutines than threads, from a memory perspective. From an execution standpoint, goroutines managed by the Go runtime allow some nice scheduling optimisations, as the runtime is aware of the communication between goroutines. (some of these optimisations are in Go 1.1)
Finally, there is no reason goroutines can't be preemptively scheduled. It is just an implementation detail, and work has already begun on this front.
Gccgo didn't have a garbage collector for a while, around the same time it was using threads. Was it even using a pool? Did it run goroutines for a short time on the current thread before moving to its own thread? Certainly not.
Thread overhead in Linux is 8k, only slightly more than goroutine. It's 24k for windows. You guys don't even know basic facts that should have guided the design.
I didn't work on the design. If I made a mistake that's my fault. The people that did design Go and implement the runtime know what they're doing. You didn't respond to my other points, either.
Yes a Linux process can spawn 1m threads, and testlimit64.exe spawned 250k threads both with 64k user stack and 1 MiB user stack... because that is just reserved address space if it isn't actually used. Only the kernel overhead matters to how many threads.
I'm not sure what point you want addressed... scheduling optimization? It's also a deoptimization in other ways, such as predictable latency and fairness.
I agree the designers knew what they were doing: designing a 'modern' language for 32-bit computers. The question is why?
Why 32 bit computers? There are a lot of them out there.
A 32 bit computer is more than enough to run many kinds of embedded or consumer electronic type of equipment. How many decades were little 8 bit or 16 bit microcontrollers of the 1970s still in use inside equipment? I'm not trying to say that "640 K ought to be enough for anyone", just that it's nice to be able to make native code targeted at only what is considered "standard server/desktop" today.
Can't really comment on the thread capacity issue, but having a systems oriented language also target slightly obsolete hardware, such as an Intel Atom, maybe, is a good thing.
An implementation is what you have to use, you can't use 'the language design'. When you have to use a special build system to generate Go -and- C stubs to call out, can't use a standard linker or partial compilation or shared libraries, have a massively large exe, can't embed in other programs (Go has to be 'main'), can't use with SELinux, has huge unpredictable latency spikes, etc it'll be cold comfort that those problems are 'just implementation details'.
It's too bad they had NIH or thought 32-bit was the future and didn't stop at just creating a language.
I don't understand any of your criticisms. What else would you expect to use to compile Go besides "a special build system"? A not-special build system? It's a compiled language. You can, however, use gccgo to link Go code with C/C++ code.
Both gccgo and cgo already use partial compilation. You can invoke the 6l, 8l, etc. commands directly if you really want to. One of the design goals was build speed. Shared libraries are on the roadmap. See issue https://code.google.com/p/go/issues/detail?id=256.
What the hell does "can't use with SELinux" mean? Are you talking about issue 871 that was fixed in 2010?
You keep repeating over and over that Go is optimized for 32-bit systems. Repeating something doesn't make it true. In fact, exactly the opposite is true, however. Go uses a lot of address space, which is great on 64 bit, not so good on 32-bit. This has been discussed a lot years ago: http://lwn.net/Articles/428100/ is a good place to start.
I've seen you repeat over and over in multiple discussion threads that having lots of kernel threads is no big deal. What you don't seem to realize is that in C/C++, thread overhead is somewhere between 1 MB and 2 MB a thread if you want to use pthreads and glibc and avoid random data corruption resulting from thread heap collisions (hint: you do). Linux also doesn't have an O(1) scheduler any more in mainline; it uses the completely fair scheduler (CFS), which has complexity O(log N). There are disadvantages to green threads, but to even start discussing this requires a lot more background than you have.
Again, you can read about any of this on wikipedia, LWN.net, or even the replies that inevitably get made to all of your posts on HN. So learn already.
I assume your 8k figure is 4k kernel stack, and 4k userspace stack. First of all, your kernel may not even be compiled with 4k stacks. 8k kernel stacks are very common. But even if we assume that you do have a 4k kernel stack, you will never be able to do much useful with a 4k userspace stack.
So the absolute mimimum for your normal C/C++ programmer, who doesn't want to do things like reimplement pthreads, is 20 kb. Even that figure seems suspicious to me because it doesn't take into account things like thread-local data in glibc and various other system libraries.
You will quickly realize that if you want to do "luxurious" things like use standard glibc functions, call recursive functions, and so forth, you need more than 4k of userspace stack. For historical reasons, only the main thread has a growable stack-- all others have a fixed size stack. If you exceed the stack size on a thread, you will get undefined behavior, possibly a security vulnerability or a crash.
Finally, there is no reason goroutines can't be preemptively scheduled. It is just an implementation detail, and work has already begun on this front.