I think the interesting question there (and really any language comparison) is to compare the result of idiomatic implementations. Unfortunately, there is no clear answer on what is idiomatic, even with something like Python that is more opinionated than most on the question.
I think the best way forward on the idiom question is to accept that we can't have a competitive/adversarial benchmark suite where idiom is a factor. I think you really need one author, or maybe a group of colleagues who trust each other, to write the whole suite and make their own choices about what idiom means to them. Folks who disagree with those choices can either write an article about how much performance you gain from making what changes, or just produce their own whole suite. Having a very high level comparison like "this is how languages stack up with these idiomatic choices, but this is a different chart with different choices" would be interesting, even though it would take more work to interpret it.