Worked on anesthesia machines for a few years. Since both hardware and software is involved, the testing was quite extensive.
* Lots of manual testing. While we did unit testing and some automated integration testing, most defects were found using exhaustive manual testing by trained engineers.
* Randomized UI testing. Used UI automation to exercise the UI with various physical configurations of the system. Would often run this overnight on many systems and analyze failures every day.
* Extensive hazard analysis. Basically, we wrote down everything that could possibly go wrong with the system (including things like gamma radiation), estimated the likelihood and harm, and then listed mitigations. The entire system could run safely even if there was full power failure. "Fail safe"
* Detailed software specifications, each of which was linked to manual test cases. Test cases were signed off when executed.
* Animal testing for validation. We went to a vet school and put a bunch of dogs under and brushed their teeth.
* Limited release for production. We would launch the system at one or two hospitals and monitor it for a few weeks before broader release.
It can operate completely mechanically. There is an integrated UPS in case mains fails. If the battery also fails, a pneumatic whistle goes off to alert the user. If for some reason something goes really bad, they switch to using an ambu bag, typically hung on the back of the machine.
I like the idea of the pneumatic whistle. A pressure vessel with a sprung loaded valve held closed by a solenoid, maybe? However it works, it's a really neat piece of lateral thinking.
You just have to look at Therac 25 for the risks of relying on software interlocks alone. One of the prevailing pieces of feedback was the lack of hardware interlocks - whether that's possible on anesthesia machines I don't know…but the prevailing wisdom is to use/include hardware interlocks wherever that's feasible, for any critical life-supporting equipment.
I don't disagree that "hardware interlocks" are a good idea for such equipment. But now that I think about it, I'm annoyed that we can't know if software was competently written and tested - is it really that different from hardware in that respect?
How about, we require that the source be open, and if it's too convoluted for the hospital's respected experts to check, then it fails inspection?
Hardware uses many standard parts and materials, and similarly, the software could use a few plain-simple-standard libraries, like libc (but not the floating point functions), zlib, libpng.
The Therac 25 case was just incompetence. The vendor was told about the problem, but was in denial, then later supplied a hardware fix which didn't fix the problem. The problem had to be thoroughly investigated and proven by a doctor and operator over many months. Why couldn't the vendor have investigated more thoroughly themselves, in a week or so? Why weren't they more careful about race conditions? (The problem was triggered by a human able to type into the interface too fast. An actual human.)
The combined utility of hardware and software interlocks is that they're complementary:
Sometimes it's easy to specify an interlock in the language of hardware: Never, under any circumstances, should it be possible to slew an avalanche-control howitzer to point at permanent structures; let's use a steel pipe to block the barrel from traversing beyond safe limits.
Sometimes it's easy to specify an interlock in software: Never, under any circumstances, should a rocket launch unless every desk at mission control has authenticated their assent with the main control system.
When interacting with the real world, real-world interlocks are handy, but they're hardly sufficient to guarantee safety. Nothing is.
Not sure what you mean by "interlocks", but the hardware was quite distributed. Each critical component had its own board and industrial microcontroller. And we had various levels of watchdogs keeping track of system health at all times.
Interlocks are usually fairly crude safety measures, normally in hardware, to make sure that particular combinations of events cannot happen.
The Therac-25 is a famous comp.risks cautionary tale. Among the many, many design misfeatures (if you haven't come across it, it's worth a read) was the one that killed people:
It was capable of providing two kinds of radiation therapy; electron beam radiation and X-ray radiation. It worked by having an electron beam generator which could be operated at either high power or lower power. Low power was used directly. High power was only used to irradiate a tungsten target which produced X-rays. (I'm simplifying here.)
You can probably guess what went wrong; people were directly exposed to the high power electron beam. Several of them died.
The obvious interlock here (which apparently previous versions had) was to have a mechanical switch which would only enable the high-power beam when the tungten target was rotated into place. No target, no high power. Simple and relatively foolproof (although it's possible for interlocks to go wrong too).
Really, every engineer should read the report[1] from the Therac-25 investigation. I would hope anybody that is working on anything that could be potentially dangerous has already read it.
There problems in the Therac-25 went a lot further than just the bad design of the target-selection, which had an (badly designed) interlock. It checked that the rotating beam target was in the correct position to match the high/low power setting (and NOT the 3rd "light window" position without any beam attenuator).
While many design choices contributed to the machine's problems, you could probably say that two big design failures lead to the deaths associated with Therac-25. One was this interlock, which failed if you didn't put it in place (there was no locking mechanism, either, just a friction stopper). If the target was turned slightly, the 3 micro-switches would sense the wrong pattern (bit shift)... which was pattern for one of the OTHER positions.
There was also a race condition in the software that would turn on the beam at a power MUCH higher than it is ever used. This race was only triggered when you typed in the treatment settings very quickly, which is why the manufacturer denied there was a problem: when they tried to recreate the bug by carefully - that is, very slowly - following the reported conditions, it never failed.
Therac-25 is an incredibly powerful lesson in what we mean by "Fail Safe", and why it is absolutely necessary to have defense in depth. Fixing the target wouldn't have fixed the race condition power-level bug. Fixing any of the software wouldn't have fixed the bad target design that could be turned out of alignment. Oh, and they had a radiation sensor on the target (which could shut off the machine as another independent layer of defense... but they mounted it on the turnable target, so the micro-switch problem allowed the sensor to be moved away from the beam path.
The really telling thing, though, is how the previous model acted. It was not software controlled, and was an old-style electromechanical device. It turns out the micro-switch problem existed there as well (among other problems)... and it would blow fuses regularly. Which was yet another layer of safety. It turns out that when they upgraded it to a software-based control system, they got cheap and took out all those "unnecessary" hardware interlocks and "redundant" features. There is a lot of blame to go around, but this is where I put most of the responsibility. You never assume one (or even a few) safety feature will work - the good engineer assumes it will all break at any moment, and makes sure that it will still Fail Safe.
> (although it's possible for interlocks to go wrong too)
If there is one lesson to learn from the Therac-25, this was it. Things break, mistakes happen, and when you're building a device that shoots high-energy x-rays at people, you need to assume that everything did go wrong, and make sure the rest of the device can safely handle that situation.
"The really telling thing, though, is how the previous model acted…it would blow fuses regularly"
Good. When a fuse blows, it shows something is wrong, and needs fixing. Replacing the fuse with a nail or something else that doesn't blow is a sure-fire way to set the thing on fire. Bad enough for a desk-lamp, a little worse for radiotherapy machine.
Sounds like people were irritated by fuses blowing, and decided to simply short-circuit the fuses instead.
The people using the machine a the hospital would replace the (expensive) fuses when they blew. It was the manufacturer that made the later model (the Therac-25) that didn't have the fuses (and other "old" hardware features).
Obviously, something was still very wrong. User error (or other bugs? I'm not sure) in the older hardware and the infamous race condition in the software-controlled Therac-25 was causing the beam to turn on some shockingly high amount of power. The better design of the older models saved people's lives by simply blowing fuses when the power went too high.
You could, perhaps, blame the poor communication between the hospitals and the manufacturer, because the fuse problem should have cause a bit of a panic among the engineer who designed the machine.
* Lots of manual testing. While we did unit testing and some automated integration testing, most defects were found using exhaustive manual testing by trained engineers.
* Randomized UI testing. Used UI automation to exercise the UI with various physical configurations of the system. Would often run this overnight on many systems and analyze failures every day.
* Extensive hazard analysis. Basically, we wrote down everything that could possibly go wrong with the system (including things like gamma radiation), estimated the likelihood and harm, and then listed mitigations. The entire system could run safely even if there was full power failure. "Fail safe"
* Detailed software specifications, each of which was linked to manual test cases. Test cases were signed off when executed.
* Animal testing for validation. We went to a vet school and put a bunch of dogs under and brushed their teeth.
* Limited release for production. We would launch the system at one or two hospitals and monitor it for a few weeks before broader release.