Well, it depends ... ;-) The important thing is to not change information. "Sani...

yetanotherjosh · on Sept 9, 2016

I hear what you're saying and it represents an ideal. But there are circumstances where information really has to be removed. Perhaps because the user is no longer present and it was collected under circumstances that had more liberal validation. Or because you're handing information across a boundary of implementation ownership and can't trust the receiver to handle potentially dangerous information correctly. I agree that sanitization (in the sense of stripping bits out of a user data payload according to some security rules) shouldn't be the first tool in the toolbox, but I would really hesitate to say it's always the wrong thing to do.

Edit: here's a good example. I don't know if they still do this, but when I worked at Yahoo!, they used a modified version of PHP that applied a comprehensive sanitization process to all user inputs. As a frontend coder at Y!, all the information you pulled from request parameters, headers, etc, ran through this validation at the PHP level before your app code got to it. You can then literally splat this information into an html page raw, without any further treatment, and not expose the Y! property you work for to an XSS or other injection vectors. There were ways to obtain the raw input using explicit accessors when needed, and these workarounds were detectable by code monitoring tools and had to be reviewed and approved by security team(s). Overall this worked really quite well, in my opinion. Y! could hire junior frontend devs without deep knowledge of data encoding, security issues, etc etc and rest easy. I think the principle of safe-by-default, even if it means destruction of user input in some cases via aggressive sanitization, is a good principle to apply to a frontend framework.

zAy0LfpBZLC8mAC · on Sept 9, 2016

re edit:

No, that's just a terrible idea. It might work quite well in the sense that it prevents server security holes. But it makes for terrible usability, and potentially even security problems for the user. The user expects that their input is reproduced correctly, and if it isn't, that can potentially have catastrophic consequences because it might result in silent corruption.

See also: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

If you think that it should be possible to have developers who don't understand this problem and its solution, you might as well argue that it should be possible to have developers who don't know any arithmetic or who are illiterate.

If you want to have some help from the computer in avoiding injection attacks, the solution is a type system that ensures that you cannot accidentally use user input as HTML or SQL, for example, or possibly automates coercion when you need to insert pieces of one language into another.