Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



And the content, for anyone too lazy to click:

Commenter

>what's really amazing is that twitter programmers thought about this edge case and made sure the tweet would not display itself

Twitter Engineer

>We didn't think of this edge case. Someone did this about 7 years ago and the recursive hydration would make a tweet service crash by simply loading the tweet in a browser. It took a principal engineer an entire day of wading through heap dumps to figure out what was happening.


TIL: debugging via memory dumps is a Principal Engineer level skill.

Anyone here actually do this? I read about it in Release It and it sounds by far like the closest thing there is to a super power when it comes to solving production incidents. I've never actually seen anyone do it though.

Recently saw a video on this technique from Dotnet Conf. Piqued my curiosity again, and now this. I've really gotta learn this.


The fact the principal engineer was doing this does not mean doing this is a principal engineer skill. There's lots of software engineers who can deal with coredumps which is pretty much the same idea.


I have done it once successfully in 10 years (.NET dev). Would recommend having any other kind of logging or instrumentation in place so you don't have to do it. It's still worth learning WinDbg and sosclr.


In my company, we used to have a plugin for our bug tracker to automatically analyze .NET core dumps with WinDbg (if they were attached to a bug) and extract some useful information. We used to do this relatively often, for a shipped product, not a live service, especially if we found memory leaks.


Would you say something like that is worth to set up?

I noticed EC2 now has an API to get memory dumps. Theoretically you could automate collecting memory dumps when an unhealthy instance is pulled out of a load balancer. Then some automated analysis could happen, and allow further manual analysis.


Not sure how much it cost, but it was definitely helpful - even the fact that it was obvious which team needed to take a look first based on the objects that had leaked often made it worth it.


I remember spending quality time with coredumps and gdb back in 2012/2013, when a prototype supercar dashboard we were building crashed on certain CSS animations.[ß]

The call chain went through GTKWebkit, Wayland and all the way to Pango and Cairo. Getting that part untangled took a long afternoon. Figuring out the root cause was another two full days.

The topmost parts of the stack above could be dealt with breakpoints, but even with pango/cairo libs from a debug build it was painful. The failing function could only be single-stepped, trying to place breakpoints inside it would not work. In the end it was an unhandled divide-by-zero deep inside the rendering library.

ß: story for another time.


How else do you debug C/C++ programs that crash?


By having a crash harness in the program that dumps the call stack and relevant internal context. Coredumps are really an option of last resort.


WTF? If you already have the infrastructure to coredump, they are without a doubt the most convenient way to debug. A stacktrace does not even begin to compare. It is like limiting yourself to printf-debugging in the presence of gdb.

Actually, it exactly is! Now I'm not sure if you were /s or not.


It all depends on how tangled your spaghetti are.

For the code that implements basic state and invariant checks (ie ships with asserts compiled in), crashes are usually exceedingly rare and limited to one of these checks failing. Debugging them requires a stack trace and, optionally, some context related to the check itself. If the program dumps this info on crash, the fix can typically be made in less time it takes to retrieve/receive the coredump and start looking at it. If it can't be fixed this way, then it's to the coredump we go.

On the other hand if the code is prone to segfaulting on a whim, requiring dissecting its state to trace the cause down, then, yeah, it's a coredump case too. But a code like that shouldn't be running in production to begin with.

So that's, roughly, what the F is.


Sure, if by miraculous chance you happen to have printf'd exactly the state you required to figure out the assert/crash, "you don't need gdb". You could also find -- by divine inspiration -- what went wrong just by looking at the line number where the assert failed. But it's still WTF-y to argue that therefore, an actual {,post-mortem} debugger is "a last resort tool".


Any chance you remember which video this was? I can't see it in the dotnet conf 2020 playlist.


Analyzing Memory Dumps of .NET Applications.

https://channel9.msdn.com/Events/dotnetConf/2020/Analyzing-M...

I found another one where more detail is gone into on how to script WinDbg to have breakpoints that run code to do stuff. Sounds pretty powerful.


I did this in my second year as a professional coder and it took me a while (a week? a week and a half?) to understand what to do and what I was seeing. I would prefer never to have to do it again.


Hydration?


Hydrate: (verb, jargon) To populate with metadata or subobjects.

i.e. The FriendsList service hydrates each friend object with a list of friends you have in common





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: