Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Heatmaps Make Ops Better (honeycomb.io)
79 points by rhema on Nov 10, 2018 | hide | past | favorite | 10 comments


I trade forex, every week I download my trades of that week and put all trades in a heatmap. I've been doing this for quite a while now, and with just one look I can see where I do good and where, not. I then take a screenshot and compare to screenshots of previous weeks.

This gives me the ability, in a few seconds, just browsing through the screenshots to see if my changes in tactic and my bots, help and which way if more productive/profitable.

To KonSchubert's point, scatter plots help when I have a small number of one or two dimentional attributes and it's pretty 'clean'. When I have multiple attributes (profit, duration of trade, etc) heatmaps work better (for me).


Very interesting. Can we see an example heatmap of yours from any week?


Super cool! It reminds be of Netflix's Flamescope[0] also from Brendan Greggs work at https://medium.com/netflix-techblog/netflix-flamescope-a57ca....

The reasons heatmaps are "news" is because not everyone has any formal exposure to data/stats curricula. The concept is easy, straight forward, and nearly obvious but unless you're in the position to have datasets to try to understand (such as a class where they have datasets and questions posed for you!) you just don't hear about this stuff.

What often happens is that you have a problem you're trying to understand: e.g. "Why does this crash?" or "Why is this slow on Wednesday?", and you use the data you have available to solve it. You don't normally have access to the kinds of data at the resolutions you need for something like heat-=maps to come into play. You're ops not a the application engineer and you just have 50-90-95-99 latency percentile graphs pre-aggregated in minute windows in nagios or graphana and maybe a few more of those for IO, NET, CPU, and thread counts and you're trying to correlate between these to form a hypothesis.

If it's important enough AND you have bad luck AND it can't be solved with more hardware, THEN you go deeper. You get to start actually trying to decide what new data is worth collecting ad-hoc, during what time intervals, etc. Only then do you even have data with the resolution where it's worth talking about heatmaps.

That's why this is and interesting reintroduction or new to people. The post helps justify the data that you'd need to collect in order to use heatmaps as much as advocating heatmaps themselves.


I don't get why you think metrics at low resolution or about resource utilization aren't usefully visualized via heatmaps. They absolutely are!

Let's say your application's aggregate p99 latency goes up. Generate a heatmap showing time (x-axis), percent CPU utilization (y-axis), and number of servers at that utilization (z-axis, color saturation). Oh, turns out one of your 500 applications servers has developed much higher CPU utilization than the rest! Better go kill the intern's bitcoin miner and bring latency back to normal.

http://www.brendangregg.com/HeatMaps/utilization.html


Your example is a strange one. A line graph showing time (x-axis), percent CPU utilization (y-axis) would very readily show you a single host as an outlier well above the rest.


A line graph like that isn't easy to render or interpret once you have many hosts. Take the example in the blog post I linked: utilization of 300 host (5,312 CPUs).

Here's the data visualized as a line graph: http://www.brendangregg.com/HeatMaps/lines-allcpus-image.png

Here it is as a heatmap: http://www.brendangregg.com/HeatMaps/cpu-utilization-heatmap...

Which one conveys more information?


For contrast, I went back and read Brendan Gregg's explanation of heatmaps and found the concise explanation of the mechanics clear and easy to understand:

http://www.brendangregg.com/heatmaps.html

TBH, the Honeycomb overview felt long and difficult to get through. Does anyone argue against heatmaps and demand a case be made justifying "why heatmaps"? This read like it was fighting hard in defense heatmaps; against who I'm not sure :)


Ops teams at tech firms often don't have much of a formal background in stats. I used to see folks giving the "10 ways not to visualise data" talk at least once a year, and still most service dashboards were just rank upon rank of line plots with multiple different scales on each axis...


It is seems based on the classic trick of inventing an enemy ready to be defeated...


So, yea, Heatmaps are a thing and they advantages and disadvantages compared to scatter plots. Welcome to plotting 101.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: