About the title, "Blocking I/O: it's not just for pansies".
Whether the title wording is ironic or sincere -- and I'm guessing it's used jokingly -- using it presumes a very specific shared outlook with the reader that is, likely as not, wrong. Certainly wrong in my case.
I'm not claiming the title is homophobic. I claim it's distracting to some portion of readers and serves to introduce the actual content of the linked submission poorly. The submission title does not nearly reflect the attitude or writing in the actual linked post which is, in comparison: specific, technical, and non-abrasive.
Frankly, I agree with you (and I wrote that title). I'd much prefer to have submitted it with the actual article's title, but I didn't think anybody would read/upvote it in that case, so the post would die without HN seeing it. I've noticed a lot of people lately getting all excited about async I/O, and I wanted something from the other side to hit the front page so people would read it, think about it, and comment. It would be interesting to A/B test submission titles, but unfortunately HN doesn't really support that. When we do get dupes, it does seem that the most inflammatory, least informative titles tend to win.
FWIW I automatically read it as definition 2 at http://www.thefreedictionary.com/pansies and the potential of it being a homophobic slur would have never occurred to me.
Important to remember that we're talking about threads and IO in one specific VM case, etc. The same assumptions won't necessarily hold in, for example, a .NET case, as the memory cost of a thread is more significant (in some qualified cases).
Fundamentally, most of these points have so many caveats were you to want to extrapolate them to general programming technique, that you're much better off simply saying "benchmark your actual cases and try the alternatives".
Also, programming in C, it's very easy use thread-per-connection model especially if synchronization is not necessary. If you want a pool of threads it gets a bit more complex, but if you always spin up a new thread for each connection, it's easy as pie (and in my case doesn't matter since the time spent creating a new thread is negligible compared to the amount of time it'll spend doing useful work).
If you want to keep the easy thread-per-connection model even with 1M connections you might have a look at Erlang - it takes the best of both worlds (threads and async IO) via its processes (so you get a process-per-connection). I read somewhere that the overhead of a process is just about 300 bytes.
Event loops are even easier, and the per-connection overhead is often a single struct. And if you want to do async disk IO, you can do it in another thread. (The IO::AIO module for Perl is a great example of this technique.)
I've found similar results to those stated in the article. In my case I found that the Tomcat Http11NIO connector was actually slower than the default Http11 connector. In benchmarking my server I found the default Http11 had about 20% higher throughput than Http11NIO. From what I understand (and I may be wrong since I didn't delve to deep), but it is because there is a high cost when using methods accessed through the Java Native Interface (JNI)
1) Recent versions of the 1.6 JDK will use epoll on Linux. Thus the benchmark should be re-evaluated. poll() is known to not very scalable. There is an issue, however: NIO only supports level triggered (not edge triggered) epoll.
2) This doesn't cover the case of threadpool starvation. I.e., there are multiple connections, some are very fast, some are very slow.
Prime example of this would be a client for a WAN-distributed database or a WAN-distributed file system: most operations are local (5 ms), some operations are remote (80 ms). Remotes are lingering longer and longer in a fixed size threadpool, leaving less and less space in the threadpool causing a longer wait time for incoming operations. You can even have this without WAN distribution e.g., 80% of operations require no random disk seeks (are retrieved from cache), 20% require them (orders of magnitude slower operation even with elevator scheduling).
Depends on what you're doing. I read that article a couple years ago and thought "Take that you libevent C zealots." And while I still firmly believe that the JVM is optimized and mature enough to hold its own against platforms that have traditionally been considered faster, I still think there are cases where event-based I/O will out-perform thread-per-request. For instance, if most of what you're doing is shuttling bytes back and forth (like in a proxy). I also wouldn't be surprised if a thread-per-request app took a bigger hit when virtualized (like EC2) than an event-based app.
If you start a thread for each request in Java, you can easily run out of memory when serving large number of connections. This is because Java uses kernel threads, and the stack of threads are allocated from heap when created, usually around the size of 1MB.
This means that you will need around 10GB of memory when serving 10000 request, while an event based app can serve the same amount of requests with minimal memory.
You also have to take into account that thread creation and context switching is really an expensive operation (contrary what the OP is saying), so the thread-per-request app is adequate only for serving small number of requests but for large numbers you will need to use the event-based approach.
You might get along with a threadpool based approach, but take into account that if your protocol is not stateless, then you will need shared state which means you have to use concurrent data structures, which might complicate code.
As a side note, I really enjoyed attending the SD West conferences in the past. Alas, it has morphed into something focused on Cloud Computing and lost it's platform independent focus.
I realize there are lots of special cases and exceptions to what I'm about to say but I've come to 2 good general rules of thumb when it comes to designing concurrency:
1. prefer processes over threads -- because your architecture can scale horizontally across multiple boxes easier, because you become free to write each piece in a different language, and because mutable shared memory is problematic in very subtle, counter-intuitive ways
2. prefer events over processes or threads -- because you can handle much higher concurrent IO traffic on a single machine, due in part to reduced memory use
Whether the title wording is ironic or sincere -- and I'm guessing it's used jokingly -- using it presumes a very specific shared outlook with the reader that is, likely as not, wrong. Certainly wrong in my case.
I'm not claiming the title is homophobic. I claim it's distracting to some portion of readers and serves to introduce the actual content of the linked submission poorly. The submission title does not nearly reflect the attitude or writing in the actual linked post which is, in comparison: specific, technical, and non-abrasive.