Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would like to know if any site reliability engineer or sys-admin or someone in any (dev)ops role ever seeks the help of distributed systems theory. I have seen many of the real-life systems that are listed in the article thrown in as a recommended reading for preparation for many job interviews. I've just never managed to figure out in what situations or scenarios (especially troubleshooting-related) the knowledge of the design or the design principles behind a distributed system comes really handy for a person in any of the roles I mentioned above. Any enlightenment will be greatly appreciated.

(EDIT: I guess what I'm trying to get at is that it would be nice to read a detailed post about how someone who maintains a big distributed system handled a major troubleshooting or scaling problem and how, if any, distributed systems principle or theory came handy in that exercise.)



I don't have time to write a detailed post, but would a book recommendation work? :)

"The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems", coming out next month, provides a good viewpoint on how distributed systems design knowledge is useful to someone in a sysadmin role. (I've read a preview via Safari Online and found it an excellent resource.)

[0] http://www.amazon.com/The-Practice-Cloud-System-Administrati...


Thanks! I bookmarked this one a few weeks ago and can't wait for it to come out next month.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: