I would like to know if any site reliability engineer or sys-admin or someone in any (dev)ops role ever seeks the help of distributed systems theory. I have seen many of the real-life systems that are listed in the article thrown in as a recommended reading for preparation for many job interviews. I've just never managed to figure out in what situations or scenarios (especially troubleshooting-related) the knowledge of the design or the design principles behind a distributed system comes really handy for a person in any of the roles I mentioned above. Any enlightenment will be greatly appreciated.
(EDIT: I guess what I'm trying to get at is that it would be nice to read a detailed post about how someone who maintains a big distributed system handled a major troubleshooting or scaling problem and how, if any, distributed systems principle or theory came handy in that exercise.)
I don't have time to write a detailed post, but would a book recommendation work? :)
"The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems", coming out next month, provides a good viewpoint on how distributed systems design knowledge is useful to someone in a sysadmin role. (I've read a preview via Safari Online and found it an excellent resource.)
(EDIT: I guess what I'm trying to get at is that it would be nice to read a detailed post about how someone who maintains a big distributed system handled a major troubleshooting or scaling problem and how, if any, distributed systems principle or theory came handy in that exercise.)