Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting point of view, the problem in compiler construction is well known ("Proebsting's law", though it says it's more like 18 years instead of 10.)

The issue with benchmarks is surely well known, also by the PyPy authors; I wonder what the biggest application is that they have benchmarked or that runs on PyPy.

Your point on the JIT compiler interrupting program execution is certainly valid, too, but not necessarily so. One could easily do the code generation in a separate background thread and let execution switch over only if necessary. But, as you have already said, a latency issue certainly exists. This is one of the cases where interpreters usually have a leg up, and there are promising ways of optimizing interpreters.



Yes, you could do a background thread, with some caveats:

1. On most current CPU's, this will cause really bad cache/memory thrashing, enough to probably impact the program.

2. This may actually cause significant slowdown, depending on how long it takes to optimize a given set of code (IE it may be better to spend 100ms paused optimizing than 5000ms in the background). This is, of course, a latency issue.

3. State of the art for most JIT's is still to use one thread. The number of folks doing actual parallel code generation is nil. So sadly, even if you had 4 cores, 3 empty, you'll still, at best, get to use one of them for the background thread doing the optimizing. There are parts that are trivial to parallelize if you've structured the JIT "right", but they aren't always the parts that are slow.


Background compilation in a separate thread actually works pretty well. IE9 has been shipping it with Chakra for a while, and Firefox is now getting it (and it improved the benchmarks a lot, especially on ARM).


Good to hear it's gotten better. Admittedly, I wasn't thinking about browser based JITs when I said that :)

I'm actually curious if you have any stats on how much of the time this is being done on actual busy machines where it's going to compete for L1/etc resources vs how often it's able to be offloaded onto an otherwise empty core.

IE i expect their to be a significant difference in the use cases for JIT's like PyPy, which are probably going to sit on shared servers that folks are trying to maximize utilization of, vs desktops where I imagine most browsing probably doesn't use all cores at 100%.


> Admittedly, I wasn't thinking about browser based JITs when I said that :)

Don't HotSpot and JRockit also do background (de)compilation & swapping of generated code?


Yes, but in hotspot's case I cannot remember if it is actually turned on in both "server" and "client"


Aren't server and client not now merged with tiered compilation in Hotspot?


No, AFAIK. "Tiered compilation, introduced in Java SE 7, brings client startup speeds to the server VM. ... Tiered compilation is now the default mode for the server VM. "

Again, AFAIK, the server VM still has a significantly different set of tuning than the client VM. In particular, it runs some significantly more complex opts that the client VM does not.


ad 1) Hm, this seems to be a good point, but what's with the following line of thinking: some thread A interprets a program P, while another thread B compiles P to native machine code (P'). Now, if another thread C would start executing P' (taking the data/snapshot from A), then C's caches should build up and remain accurate. Of course, if this happens too often, then the caching behavior will be shitty. I always wondered (based on my interest in interpretation), how much I-cache misses the instruction cache flushes after inline-caching in native machine code cause. (If you have some data on that, please let me know.)

ad 3) I am well aware of that. However, I remember that at PLDI'11 there was a talk from Univ. of Edinburgh chaps doing parallel trace-based dynamic binary translation. Obviously, DBT is less work than a high level, full-blown JIT, but at least it's not nil :)


I wonder what the biggest application is that they have benchmarked or that runs on PyPy.

speed.pypy.org has benchmark info on Django, Twisted and some other large, non-trivial codebases.


I know these from the site, and have looked at the Django benchmark that's listed there. I think it's a rather small benchmark that does exercise lots of Django internals (but that benchmark comes from Unladden Swallow). I don't know for the twisted ones, though.

What I actually wanted to know, what the biggest application is, i.e., a not benchmark.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: