Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that's because for most applications where bodily harm is a possibility you generally (in my experience) have hardware protections that will prevent the software from doing anything stupid. Take an elevator for instance, even if the software controller is bugged (or hacked) and decides that it should drop the cabin from the top floor to the ground level at full speed there are hardware protections (security brakes, limitations on the engine itself etc...) that will take over and make sure nobody gets hurt. Therefore in order for something to go completely wrong you need both a software and hardware failure. The main flaw in Therac-25 was arguably that no such protection was present, the hardware should have been designed in order to make the bogus configuration impossible to achieve solely in software.

I think unfortunately this is going to change with the advent of "AI" and related technologies, such as autonomous driving (we've already had a few cases related to self-driving cars after all). When the total enumerable set of possible configurations become too great to exhaustively "whitelist" we won't be able to have foolproof hardware designs anymore. In these situations software bugs can be absolutely devastating.



I think unfortunately this is going to change with the advent of "AI" and related technologies, such as autonomous driving

Yes, the potential cost of software bugs is increasing as software does things that no hardware interlock can stop. And worse than that, as a society we largely haven’t realized we need to optimize for worst case (not average) performance of algorithms, because they WILL be attacked. If you’re lucky, they won’t be attacked by sophisticated, well resourced nation-state attackers. But sometimes that will happen.

The rise of complex algorithms to control complex processes is the real difficulty. Facebook’s banning algorithm is an example of something that has been exploited by attackers.

Let’s hope voting software is not the next target where bugs can be exploited. Because changing political decisions can and does produce life-changing effects.


Your point rings true even in this case. There was another Therac (50? 100? It’s been a while since I read about it) machine which had the same bug, but where noone got hurt due to hardware safeguards.


In my opinion, one of the most tragic aspects of these horrific incidents is that the predecessors of the Therac-25 actually had independent protective circuits and other measures to ensure safe operations, which the Therac-25 lacked.

Here is a quote from http://sunnyday.mit.edu/papers/therac.pdf:

"In addition, the Therac-25 software has more responsibility for maintaining safety than the software in the previous machines. The Therac-20 has independent protective circuits for monitoring the electron-beam scanning plus mechanical interlocks for policing the machine and ensuring safe operation. The Therac-25 relies more on software for these functions. AECL took advantage of the computer's abilities to control and monitor the hardware and decided not to duplicate all the existing hardware safety mechanisms and interlocks."

So, regarding these important safety aspects, even the Therac-20 was better than the Therac-25!

The linked post also mentions this:

"Preceding models used separate circuits to monitor radiation intensity, and hardware interlocks to ensure that spreading magnets were correctly positioned."

And indeed, the Therac-20 also had the same software error as the Therac-25! However, quoting again from the paper:

"The software error is just a nuisance on the Therac-20 because this machine has independent hardware protective circuits for monitoring the electron beam scanning. The protective circuits do not allow the beam to turn on, so there is no danger of radiation exposure to a patient."


I have friend who has 40 years programming experience, he is building a computer controlled milling machine in his basement.

When I asked him about the limit switches it turns out they are read by software only and the software will turn off power to the motor controllers if a limit switch is activated.

I asked why he does not wire the switches to cut power directly to be on the safe side.

His answer "It's to much bother to add the extra circuits."

We are talking less than $20 in parts and a day of his time. If the software fails after sending the controller a message to start moving the head at a certain speed then crashes there is nothing to stop the machine wreaking itself.

E.C.P.


While it's not terribly uncommon for small hobby machines to depend on software limits I can't think of a single instance in my years of maintaining "real" production CNC machines of encountering a machine that didn't also include hard limits.

Limit switches typically include two trip points. The first is monitored by the control system; when it is tripped the control halts execution and stops the machine. The second limit is wired directly to the servo amplifier so that, if for what ever reason, the control fails to halt the machine when the soft limit is tripped power is removed and motion is halted. Both limits are fail-safe such that if they were to become disconnected it would result in a limit exceeded condition.


In a situation like that, I wouldn't blame him. Consider while building his milling machine how many of these situations he will come across. If he had to make sure there was a hardware failswitch, it would simply not scale.

3D printers are like this too. They have mechanical limit switches [0] that are read only by software. So if there is a bug in the software, nothing is stopping it from pushing the hardware limits and breaking. Same goes the other way around, if this switch is broken, same might happen.

[0] https://i.ebayimg.com/images/g/EYAAAOSwbopZguz4/s-l300.jpg


Most 3D printers don't have massive printing heads. If they drive into the end-stops, the motors will likely just skip steps and be stuck. They are not designed to apply much force.

I'm much more worried about the heating element. Its temperature is usually controlled by the same cpu that also does motion control and g-code parsing. If anything locks up the CPU the heat might not be turned off in time, and (because you also want fast startup) there is enough power available to melt something. At the very least you would get nasty fumes from over-heated plastics, and maybe even teflon tape, which often is part of the print head. At worst it could start a fire.


As a 3D printing enthusiast, I can confirm your fears. It's all in software and while there are good control systems, nothings perfect. I had the hotbed fail and it was smoking when I found it.


Note that (depending on how the 3d printer is wired) a defective switch will result in an "endstop hit" condition. A dislodged switch however will happily keep reporting it's not being hit.


Probably not that big a deal. An operator/program can easily damage a CNC mill by jogging a substantial tool (or the spindle itself) into solid material, which is a much more likely scenario.


...but it's not "much bother" to run the control loop through the complexity of software? This seems to be the equivalent of overengineering in software, where something straightforward is instead performed through many layers of abstraction and indirection.

The straightforward way of implementing this with a bidirectional motor is to wire normally-closed limit switches in series with their appropriate direction signals, such that when the switch is actuated it prevents the motor from going in that direction, but it can still move away from the switch.


Try that with stepper motors used in 3d printers. Try it with an off the shelf driver. Make it so the limit switch only stops motion in one direction.


So that's where the youtube videos of CNC milling machine failures come from! TFA noted that major causes of the disaster was the culture and failure to independently unit test the machine.


This seems like a good time to mention that God gave us hardware interrupt inputs and watchdog timers.


> It’s been a while since I read about it

It's in the article you're commenting on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: