Do not even try to write what you think is effective code before
benchmarking. The VM has many clever optimizations. And even if
you will somehow learn of all of them, in the next release they
still can be changed. Instead, write a simple and clean piece of
code, make a benchmark, find the real bottleneck and rewrite code
in small parts.
Every now and then I see some strange code at a client I work with, and the justification is "Performance". Because somebody thought it might be faster - But they didn't check it back then. And they don't have an automated performance test that continuosly validates if their "dirty but fast" code is still faster than the clean version...
Don't even ask how many hours of developer time I saw wasted because of "dirty but fast" code. Where the code often was not much faster then the clean version. Or it was faster, but speed was not crucial in the area where the code operated.
Also, I guess a lot of the optimizations from old blog posts or the first "Effective Java" will not yield amazing results anymore, as compilers and runtimes are getting better.
I also like this one:
Do not think that programming in C++ or any other lower-level language
is a must for having good performance. Good architecture, benchmarking
and profiling are far more important.
But I don't think it is as clear-cut as the author writes it. For some problems, having total control over your memory layout and what gets exectued when (i.e. C or C++ or ...) can lead to huge benefits. And sometimes, the runtime is just more clever optimizing the code than you are.
As any follower of @mraleph (http://mrale.ph / https://twitter.com/mraleph) is aware, it's very easy to benchmark the wrong thing (or benchmark more or less nothing), or to write an isolated benchmark which doesn't reflect real-world use.
There's one thing you can do without benchmarking: approximate complexity (O(n)) analysis. You need to be sensible about what N is though: even O(n^3) isn't too bad if it's an operation on a user-displayed list of half a dozen items.
With a bit of thought this even leads to cleaner code: do you really need to pass a large chunk of data around when you only need a particular precalculated quantity?
But generally benchmarking and applying Amdahl's law is the way to go. First identify which part of the system is slow!
Things like this make compiler or jit optimizations a little scary to me. It might not matter to a CSS preprocessor, but if you write performance critical code, optimizations become a leaky abstraction. Suddenly you have to understand exactly which circumstances allow your compiler to, for example, use autovectorization and be wary of updating your compiler, lest these criteria change.
I think there is some opportunity here to improve compiler output. I know that GCC can already explain why it doesn't vectorize loops (-ftree-vectorizer-verbose=2), but many other optimizations without such output can still make or break the performance of your program.
I think what we'll end up with is languages that make it easier to be explicit about what you need. We're already seeing this happening with computer graphics, where the likes of Vulkan are in some sense lower level than OpenGL, but make it easier to understand when you're using the fast path or not. I see similar aspects to Rust, and even Haskell's recent introduction of a strict pragma.
I pretty sure halting problem reduces to figuring out code reordering and optimization like that. We cannot solve it exhaustively, so instead we do the next best thing, approximate and provide a common case. I agree that it's not optimal, but it's the best we can do.
Obviously it's impossible to perfectly optimize every time. I would just like some more diagnostics from the compiler that help me debug issues with wrong of missing optimizations.
gcc has a flag "-Wuninitialized" that warns about possible use of an uninitialised variable. It frequently yields false positives, because it's an approximation; a general "detect all uninitialized variables" algorithm can solve the halting problem.
Probably (or maybe by setting `null` .. i'm not sure how engines deal with changing data types). `delete` implies that the underlying object schema has to change. It's an heavy operation.
Yup. Same category of a misnomer as for example "n-fold times faster", n > 1, where they mean "n times as fast". 2^n vs n, for example, a 4 fold increase in something is the same as 16 times increase of that something.
> But if V8 thinks that the code is “too tricky”, it keeps it in slow but dynamic form.
The next paragraph shows what caused V8 to not optimize :
> For example, in the first set of changes I’ve used a global variable defined outside of the class, so it had an external state. In the second one, I’ve changed the class structure on the fly. As a result, V8 did not compile it to effective typed native code.
But you're right, saying "it's probably that" is not the same thing as delving in the code.
The short answer is that v8 has several different ways of storing objects and resolving property lookups, and deleting a property causes that object to be handled in the slowest mode (i.e. "dictionary mode", with property data stored in a hash table).
If you want gory details I recommend Vyacheslav Egorov's blog (mrale.ph). Here's an old slide of his that lists all the ways you can force an object into dictionary mode (at least ca. 2011):
Using Object.seal, Object.freeze and Object.defineProperty with writable, enumerable, configurable not set to true does not cause object named properties convertion to dictionary mode. However they still convert elements storage (one containing properties with names "0", "1", etc) to dictionary mode.
Similary accessors don't cause object properties storage conversion to dictionary mode unless there is a transition clash.
var obj = {};
obj.__defineGetter__("foo", function () { return 0; })
print(%HasFastProperties(obj)); // => true
var obj = {};
obj.__defineGetter__("foo", function () { return 0; }) // Transition clash
print(%HasFastProperties(obj)); // => false
Nothing changed with respect to `delete obj.foo`: this always converts named properties storage conversion to dictionary mode. However `delete arr[index]` doesn't (at least for arrays that were not in slow mode already), rules for objects depend on the amount of holes in the elements part of the object.
I have experience with profiling the `delete` keyword. It slows down post-deletion access to the parent object significantly. Generally, I avoid using it and instead set the value to `null`. Technically, this creates a memory-leak because the pointer remains but depending on the characteristics of your application it may be worth it.
The usual disclaimers of "profile it first" and "YMMV" apply, but it's useful to be aware of potential low hanging fruit when JS performance is critical to your application.
Doesn't V8 have a profiler? I don't use it so forgive the ignorance. Having to bisect your code to find a performance regression seems like a slow way of doing it compared to using a profiler. Could a V8 expert explain?
V8 has a very nice profiler. Functions are the smallest granules it understands, but it's very good for finding slow functions, and (perhaps even more importantly) for finding out which functions v8 has given up on optimizing.
But I'm guessing it didn't help for this particular problem. Deleting properties from an object can hurt performance in v8, but it's not the deletion that's slow, it just makes the later accesses to that object slower. So the line of code that needed changing may not have been anywhere near the code that profiled poorly.
While theres a small but of truth in that, unfortunately reality is much more nuanced than this -- at least, it is when performance is non-negotiable.
Optimization, when done correctly, is a feedback loop between your assumptions and the profiler. Neither one of these, on their own, is enough -- your assumptions can be wrong, but the profiler has an extremely poor signal to noise ratio. (Note: Profiling wouldn't have caught this issue, since the problem spot was not actually slow, it just made everything else slower)
I work in game development (not primarily HTML5, although I have done HTML5/WebGL projects), and performance is a hard requirement for me. Not making 60fps reliably for the hardware we need to support is a show-stopping bug. The only way to avoid this reliably without needing to rewrite sections is to design with the optimizations you might need to perform in mind. Putting optimization off until the end would be irresponsible.
> PostCSS, libsass and Less are already fast enough for any real-world task. Running benchmarks like these is like listening to audiophiles comparing gold-plated cables for their hobby systems. Don’t use these benchmarks as the main criteria for decision making when you are choosing a tool.
Don't even ask how many hours of developer time I saw wasted because of "dirty but fast" code. Where the code often was not much faster then the clean version. Or it was faster, but speed was not crucial in the area where the code operated.
Also, I guess a lot of the optimizations from old blog posts or the first "Effective Java" will not yield amazing results anymore, as compilers and runtimes are getting better.
I also like this one:
But I don't think it is as clear-cut as the author writes it. For some problems, having total control over your memory layout and what gets exectued when (i.e. C or C++ or ...) can lead to huge benefits. And sometimes, the runtime is just more clever optimizing the code than you are.