C will never stop you from making mistakes

kevincox · on Aug 10, 2020

Trying to avoid adding warnings is a very silly form to twisted logic.

1. People enabled warnings and -Werror because they want high quality code. 2. Standard can't add warnings because people use -Werror

This means that not adding warnings is directly against the original reason to use -Werror in the first place! We are now avoiding warning people about dangerous things because they requested to be warned about dangerous things!

flohofwoe · on Aug 10, 2020

The argument also doesn't make much sense because all 3 big compilers are already adding tons of warnings with each new release. Upgrading to a new compiler version and seeing screens full of warnings scroll by when compiling code that was warning-free in the previous compiler version is quite normal. Why does this affect the C committee's decision making, and why is it suddenly a problem?

sramsay · on Aug 10, 2020

If I'm following the author's argument correctly, the very influential companies who maintain very large, very old C code bases don't like what you just described -- new warning messages in code that used to not have any. They worry, then, that if the STANDARD actually mandates a warning, that is even more likely to happen, and that makes the very influential companies very sad.

Which sounds like bullshit to me.

adrianN · on Aug 10, 2020

It actually makes a little sense: Many companies are required to ship code without warnings (for example in safety-critical systems). Fixing warnings is very expensive and can introduce new bugs in code that has been running fine for decades. If you force new warnings into the compiler the result would be that these companies simply stop using newer versions of the compiler.

flohofwoe · on Aug 10, 2020

Part of the normal "warning hygiene" process is deciding when a new warning in old code should be fixed and when it's better to suppress it.

One my favourite warnings in this regard is gcc's misleading-indentation warning. The warning makes sense for new code written by a human, but if the code is machine generated, or decades old without showing any signs of problems caused by a "misleading indentation", then it is indeed much less risky to simply suppress that particular warning in that particular source file or library.

jschwartzi · on Aug 10, 2020

The problem here is that if someone dies while using your medical device and you have a "warning hygeine" process then there's a sound legal argument that you knew there were problems in the device that you chose to instead paper over and ignore. It doesn't matter that it makes sense to software engineers. What you have to consider is how a "jury of your peers" will react to you calmly and rationally explaining that a newer version of the compiler added some new warnings, but you decided that because the code has been running just fine for decades that it's totally okay to not address those warnings. It raises questions about how seriously you're actually taking software quality.

Take your "misleading indentation" warning. If you choose to ignore that warning, you're setting yourself up because I can make a great argument that you don't care that the indentation is misleading. And in fact that you're ignoring the hazards of following misleading indentation which is that another person reading your code could misread it and introduce a defect. And that in fact your policy is to allow some defects, including a possible defect which has killed the plaintiff.

michaelt · on Aug 10, 2020

Any organisation developing safety-critical code will already be following rules strict enough that they have an established deviation approval and documentation procedure.

And frankly, the idea that people writing software where defects could kill people would prefer not to be shown new defects because fixing them is an inconvenience is a pretty insulting view of that industry's professionalism and ethics.

jschwartzi · on Aug 10, 2020

And then I might ask you "so why did you apply for deviation on this warning? Warnings are bad, right? So shouldn't you fix a compiler warning if at all possible?"

And then you might reply "Well no because this particular warning would require us to change some code that's really hard to change correctly so instead of spending the time and expense eliminating a potential defect we just left it in."

adrianN · on Aug 10, 2020

Warnings are not defects. You might be surprised how conservative (parts of?) that industry are.

Konohamaru · on Aug 10, 2020

> If you choose to ignore that warning, you're setting yourself up because I can make a great argument that you don't care that the indentation is misleading.

Aka. the no brown M&Ms (not the rapper!) policy

https://www.npr.org/sections/therecord/2012/02/14/146880432/...

wglb · on Aug 11, 2020

There is a long distance traveled here between fine tuning warnings and medically critical software.

The fact is that building such medically critical software (which I have done) has many more considerations than warning levels of c compilers.

Additionally, there is the C language and what the standard says are two different things.

Finally, the vast area of standard specified undefined and implementation defined areas substantially impact writing correct software.

the8472 · on Aug 10, 2020

That hypothetical engineer would then argue with the counterfactual universe (our universe) where warnings were never added in the first place because people were afraid of adding warnings. I.e. punishing people for having this process sets the wrong incentives.

guenthert · on Aug 10, 2020

That misleading-indentation warning could also be suppressed by running the code through a beautifier before compilation (the pre-processor could be replaced so that this happens transparently on the fly).

fdupress · on Aug 12, 2020

This also papers over a potential defect, but also introduces a potentially semantic-breaking process into your compilation pipe.

Say you've proved memory safety on your source. What you're compiling is no longer that source you have a proof about.

guenthert · on Aug 12, 2020

> This also papers over a potential defect

That's what suppressing warnings tend to do, yes.

> but also introduces a potentially semantic-breaking process into your compilation pipe.

Only if the beautifier is broken.

> What you're compiling is no longer that source you have a proof about.

It sure is, again, unless the beautifier is broken.

fdupress · on Aug 16, 2020

So it turns out Materialistic doesn't even show responses to my comments unless I go back to the thread itself...

The point I was making was in context of a discussion focused on mission-critical system. In that context, you can't just add a beautifier to your compilation pipeline with the argument that "the only way things will go wrong is if the beautifier is broken".

sramsay · on Aug 10, 2020

Yeah, and my snarky tone is probably not quite fair given that fact.

Some of the "very influential companies" we're talking about are in aerospace, the automotive industry . . . the LoC numbers are truly immense, the standards compliance rules are incredibly strict, and everything moves slowly. I've never worked in an industry like that.

A more charitable view of the situation would include some representative of an industry like that on the standards committee feeling their heart skip a beat because they realize that what is being suggested would cost millions to implement.

cbsks · on Aug 10, 2020

If you are developing a safety-critical system you will not be upgrading your compiler without a very good reason to do so. In the safety-critical systems I've worked on, even the compiler options are set in stone. Changing either is a huge amount of paperwork and will probably require a re-certification of the entire system.

wruza · on Aug 10, 2020

Wait, wait, guys. Listen...

  -W2019q2

Cool, huh?

pm24601 · on Aug 10, 2020

Not if the contract in question requires those issues being address.

The acceptance criteria is controlled by the purchaser. The purchaser can and should audit for this attempt to slide in out of spec code.

erik_seaberg · on Aug 10, 2020

“When a measure becomes a target, it ceases to be a good measure.”

wglb · on Aug 11, 2020

There was this marvelous Coverity article (that I can’t find now) that amplified on what you are saying. The title of the article was something like “There is no such thing as the C language”. Customers would demand support for their idiosyncratic code.

hvdijk · on Aug 10, 2020

  struct Meow* p_cat = (struct Meow*)malloc(sizeof(struct Meow));
  struct Bark* p_dog = p_cat;

> Most compilers warn, but this is standards-conforming ISO C code that is required to not be rejected

Bollocks. That is a constraint violation, ISO C requires a diagnostic for it, and ISO C allows that diagnostic to be an error. The constraint is in the section "Simple assignment", which contains "One of the following shall hold:" followed by a list detailing when assignments are valid. Pointers to different structure types on the LHS vs the RHS are nowhere in that list.

DougBTX · on Aug 10, 2020

> Bollocks.

Only because it is misquoted, the sentence ends:

> this is standards-conforming ISO C code that is required to not be rejected unless you crank up the -Werror -Wall -Wpedantic etc. etc. etc.

kachnuv_ocasek · on Aug 10, 2020

It's not misquoted. The author claims that the code "is required to not be rejected", which OP demonstrated to be false.

DougBTX · on Aug 10, 2020

Seems you're right, all I can see is that a diagnostic should be produced, not that the code should be accepted. The impression I get is that when they say diagnostic, they really mean: "this is an error, but carry on if you like." That informs the perspective that adding warnings is a breaking change, as there is no formal distinction between error and warning diagnostics. Perhaps there's an informal "standard" between compilers for which should be which somewhere else?

hvdijk · on Aug 11, 2020

For GCC and clang, while -Werror turns all diagnostics into errors, even those for valid code, -pedantic-errors will only turn the standard-mandated diagnostics into errors. This is the only reliable way I know of to determine whether the compiler considers the code to violate ISO C's rules. (I say "the compiler considers" because there are corner cases where GCC and clang disagree in their interpretation of the standard.) If your warning does not get upgraded to an error with -pedantic-errors, the warning is about something the compiler authors consider relevant to warn about but the standard allows.

kstenerud · on Aug 10, 2020

A diagnostic is not a rejection.

The full quote explains his thought: "Yes, two entirely unrelated pointer types can be set to one another in standards conforming C. Most compilers warn, but this is standards-conforming ISO C code that is required to not be rejected unless you crank up the -Werror -Wall -Wpedantic etc. etc. etc."

Unless you make warnings errors (-Werror), it probably will warn, but will not reject the code (fail to compile).

hvdijk · on Aug 10, 2020

It specifically makes the false claim that ISO C requires it to not be rejected, right there in what you quoted. ISO C requires no such thing. ISO C requires compilers to diagnose it and allows them to choose whether to accept or to reject it. If compilers want to make it a hard error, they're perfectly free to do so. The fact that they don't is their choice, not something ISO C imposes on them.

gwd · on Aug 10, 2020

What's even stupider is that I'd be willing to bet it's also UB. If it is, it means an ISO-C-compliant compiler is allowed to generate code that does absolutely anything; which makes the worries over adding a warning kind of ridiculous.

wtallis · on Aug 10, 2020

I wonder if this might be a case where the assignment isn't undefined behavior, but an attempt to dereference the pointer and access the wrong struct type's members would be.

fanf2 · on Aug 10, 2020

The Bark Meow pointer conversion isn’t UB itself, provided you convert the pointer back to the correct type before dereferencing it.

zvrba · on Aug 10, 2020

I can't bother to dig out the reference, I know that but C explicitly allows such assignment when `p_dog` is the first member of `p_cat`, i.e., when

    struct Meow { struct Bark dog; }

hvdijk · on Aug 10, 2020

Nope, C doesn't allow such assignment even then. You're thinking of casts. For casts, the C standard requires that a pointer to a struct can convert to a pointer to its initial member and vice versa, this is specified under "Structure and union specifiers". You need to actually spell it out with a cast though, C doesn't allow that to be done as an implicit conversion.

NovemberWhiskey · on Aug 10, 2020

It's been a while since I called myself a C expert, but that's not an assignment - it's an initialization.

Those are not the same things in C; I think if you go look at your C standard for the constraints on initialization, they are different from and weaker than those for assignment.

hvdijk · on Aug 11, 2020

You're correct that initialization is not assignment, but the assignment section is still the relevant place to look: the initialization section says "the same type constraints and conversions as for simple assignment apply". But that isn't obvious and I should have spelled that out.

NovemberWhiskey · on Aug 11, 2020

But that's in the semantics section; not the constraints section; right?

hvdijk · on Aug 11, 2020

Heh, true. It's pretty clearly meant to mean that the constraints for simple assignment apply as constraints for initialization too, but if you were to argue that the constraints of simple assignment are imported as semantics for initialization, and therefore violations render the behaviour undefined (not requiring any diagnostic), I wouldn't be able to point to anything that shows that to be wrong other than common sense, and common sense can be wrong. The same applies to the return statement, which says "is converted as if by assignment" outside of its "Constraints" section as well.

kllrnohj · on Aug 10, 2020

Correct, the warning is about initialization: "warning: initialization of 'struct Bark ' from incompatible pointer type 'struct Meow '"

https://godbolt.org/z/zK45Ys

DiabloD3 · on Aug 10, 2020

I'd also like to chime in that both clang and gcc will complain that I didn't explicitly cast it.

quietbritishjim · on Aug 10, 2020

The question here is whether it is a warning or an error. "Complain" could mean either. Which did you mean?

DiabloD3 · on Aug 10, 2020

Assuming that you have called gcc as `gcc -Wall`, it will warn, but not error. What the author of the article wants, I believe, is the equivalent of some combination of `-Wall -Werror` and `-pedantic -pedantic-errors` by default, and that's certainly unreasonable.

gcc, however, isn't C, just a popular compiler for it, and clang might error on this particular one, but my memory is fuzzy. Someone else might want to chime in here.

fanf2 · on Aug 10, 2020

You don’t need any warning flags for these diagnostics because they are constraint violations, so a bare `cc` invocation will complain about them.

hvdijk · on Aug 10, 2020

A bare cc (or gcc or clang) invocation is not meant to be standard-conforming and will both reject standard-conforming code and leave out standard-mandated diagnostics. To accept all standard-conforming code, you need the -std=c[...] option e.g. to prevent it from picking up non-standard keywords. Without -std=c[...], GCC would flag a syntax error for the perfectly valid void asm(void) {}, because it detects asm as a keyword contrary to what the standard requires. Without -pedantic, no diagnostic will be issued for e.g. void f(int i) { int x[i]; } with cc -std=c90, even though C90 requires a diagnostic for an array where the length is not a constant. For GCC, this is documented at <https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Standards.html....

raxxorrax · on Aug 10, 2020

This is dangerous of course because dogs are larger than cats, but I think he is correct. While every compiler I tried has this warning enabled for assignments or comparisons by default, I don't think the standard explicitly forbids it. Could be wrong though.

hvdijk · on Aug 10, 2020

> I don't think the standard explicitly forbids it.

I already wrote exactly where the standard explicitly forbids it in the message you replied to.

raxxorrax · on Aug 10, 2020

I looked it up...

> both operands are pointers to qualified or unqualified versions of compatible types

Not sure, but I still would say he was correct. The compiler might not be able to check if the types are compatible, so the warning as a standard behavior is anticipatory. Sensible, yes.

hvdijk · on Aug 10, 2020

"Compatible type" is defined in the standard as well. It doesn't mean "whatever the compiler thinks is compatible", it's what the section "Compatible type and composite type" defines as compatible. Different struct types are not compatible.

BunsanSpace · on Aug 10, 2020

The standard does not.

It's common to get a data payload of struct A, but cast it's pointer to struct B which is a subset of struct A (at is the first elements.

Actually in the Win32 API this is done a lot. Often you're even given a void* to some arbitrary data that you need to figure out what should be there. Sometimes it's not even a pointer it's two integers packed together, in the case of WM_MOUSEMOVE messages.

richardwhiuk · on Aug 10, 2020

A* -> void* -> A* is legal

A* -> void* -> B* is undefined

A* -> B* is undefined

If you want to get the first element, you want, &(A->first), not a cast. The cast isn't guaranteed to be defined, and isn't guaranteed to be the same.

noelwelsh · on Aug 10, 2020

And this is how backwards compatibility comes to kill innovation. It's a reasonable stance to keep supporting your long established users but it comes at the cost of ceding the future to the competition (cough Rust cough)

mytailorisrich · on Aug 10, 2020

IMHO that's how it should be for a programming language.

If a programming language evolves to the point that previous programs written in that language no longer compile then it's no longer the same language.

So let's keep C as C and, as you point out, new ideas and concepts that would break things can be implemented in new languages.

C does evolve but it's also taken the pragmatic approach not to break the huge existing code base it has. In many cases these programs have been running for decades, they work.

Tehnix · on Aug 10, 2020

>If a programming language evolves to the point that previous programs written in that language no longer compile then it's no longer the same language.

I don't really buy that you cannot ever introduce breaking changes. That's a recipe for disaster and IMHO short-sighted.

>In many cases these programs have been running for decades, they work.

And in many other cases, they have been full of security holes that take an incredible amount of work to discover.

Something like Rust's editions seems to solve this problem very well, and is indeed what they are intended for.

You can still compile old code, you just won't have full access to new features if you choose to do so. This provides a way to keep legacy systems alive while providing an upgrade path for them at the same time.

guenthert · on Aug 10, 2020

Well, C hasn't been designed to be evolving and was also quite under-specified through most of its history; a hacker's delight, a tool to get things done, quick, but not a receipt for long-term maintainability. Given that, I'd think C did rather well.

0xffff2 · on Aug 10, 2020

>>If a programming language evolves to the point that previous programs written in that language no longer compile then it's no longer the same language.

>I don't really buy that you cannot ever introduce breaking changes. That's a recipe for disaster and IMHO short-sighted.

It's also empirically true in my experience. See Python2/Python3, Perl/Raku, C++98/C++11/C++20. In the latter case, I'm not even sure there are any significant breaking changes, but the feature sets are so different that I find that I pretty much always have to specify which version of C++ I'm talking about.

noelwelsh · on Aug 10, 2020

Scala has a tool called scalafix (https://scalacenter.github.io/scalafix/) which allows people to write migration scripts (and other things) that work based on analysis of existing code. This has been used for library upgrades, and will play a big part in the Scala 2 -> Scala 3 migration coming at the end of the year. So it's not impossible to evolve a language and keep code bases current with minimal work for the end user.

For such a tool to be effective it must be possible to statically analyse code to a reasonable degree, and C is certainly not a language that enables easy static analysis.

Gibbon1 · on Aug 11, 2020

On the other hand it's absolutely pathetic that C still doesn't have actual arrays. To head off comments, no syntactic sugar on top of pointers doesn't count at all. So don't even try to make that argument.

Oote3eep · on Aug 10, 2020

Well, precisely : if people want modern/innovating/fast evolving languages, they can use Rust, Go, Elixir, etc.

I actually started using C for my side projects since two years precisely because I want very long term backward compatibility (that is, being able to leave a program for years without maintaining it, then make a small edit in it and build it with minimum pain). C is perfect for that, and I agree with the sentiment that backward compatibility is its most important feature.

coder543 · on Aug 10, 2020

> I actually started using C for my side projects since two years precisely because I want very long term backward compatibility (that is, being able to leave a program for years without maintaining it, then make a small edit in it and build it with minimum pain).

Rust actually takes a very strong stance on maintaining backwards compatibility, and Go's stance is arguably even stronger in most cases. You're implying incorrectly that these languages are just breaking things left and right for no reason other than "innovation!", which isn't true.

The overwhelming majority of Rust and Go code from years ago will compile without problems today. Any code from post-1.0 that doesn't compile today was (inadvertently) relying on buggy, incorrect behaviors that have since been fixed... and even then, not all incorrect behaviors get fixed because compatibility is considered so important.

C is fine for certain applications, but no one should choose it for side projects based on some notion of backwards compatibility, in my opinion. If some company is building a business application that needs "backwards compatibility" in the sense that it can run on all sorts of arcane microarchitectures and operating systems, then sure... C is still a really painful* choice, but it might be the right choice then, or if there's an existing C code base, then it probably doesn't make business sense to rewrite it any time soon.

* yes, having no protection from footguns, no real standard library, no built-in concept of asynchronous code, and very little of anything useful is definitely painful. If C is the only valid choice for a project, then it's the only valid choice, and that's what you have to do. The number of projects where you simply can't use something other than C is diminishing by the day.

jki275 · on Aug 10, 2020

Rust and Go haven't been around long enough to make those statements about them.

virtue3 · on Aug 10, 2020

That and, while "backwards compatability" has been quoted as important things in these languages. I can attest to rust having issues with what is "idiomatic" rust:

https://timidger.github.io/posts/i-cant-keep-up-with-idiomat...

there are other similar complaints around the net.

While the C and C++ code I wrote over 2 decades ago is now finally "out of date" as of 5 years ago. It took a while. Not 2-3 years.

Which is absolutely attributed to the language being new, not a design fault.

C is already "settled"

jki275 · on Aug 10, 2020

I'm learning Rust now, I don't have anything particularly against it other than the obvious, big static binaries and such, and maybe those will be fixed eventually.

I've been able to compile and run C and C++ code from 20 years ago a truly amazing number of times. It's really surprising at how easy it is to work with well written code even if it's decades old.

Will Rust and Go age that way? Maybe. Too soon to tell.

rational_indian · on Aug 11, 2020

>I've been able to compile and run C and C++ code from 20 years ago a truly amazing number of times. It's really surprising at how easy it is to work with well written code even if it's decades old.

Not if you have used any third party libraries. Dependency management is a nightmare in these languages.

jki275 · on Aug 11, 2020

Not if the project is written well. Yes, dependency hell is a thing, but there are ways to deal with it and make good code. Autotools will straight up tell you version x.x.x of library y is required, and as long as that's available, the problem is solved. Dependency hell is a thing in other languages too -- try to compile a really old Java project sometime.

I spend most of my time working deep in the internals of some things that are 10+ years old running even older versions of some highly (and often badly) modified linux kernels. The well written C/C++ projects definitely stand out.

pjmlp · on Aug 13, 2020

Conan and vcpkg have pretty much settled that problem.

im3w1l · on Aug 10, 2020

It seems idiomatic C++ is recently changing much faster than before, as C++ feels the need to keep pace with competing languages. But you don't have to keep up with what's idiomatic.

guenthert · on Aug 10, 2020

It can cause friction though, if new programmers work on old code or new and seasoned programmers work on the same project.

Of those who do like Common Lisp, many like it because the standard hasn't changed in nearly thirty years (while simultaneously offering features added to the C++ standard just very recently, e.g. a filesystem path abstraction).

sacado2 · on Aug 11, 2020

I think it's the first time I see someone call Go a modern/innovating/fast evloving language. What attracted me to this language in the first place is the fact backward compatibility is seen as a dogma in the community. It's one of its strongest selling points IMO. I have production code written in 2014 that still compiles and works perfectly without any warning from any (popular) linter.

apta · on Aug 10, 2020

golang is not modern nor fast evolving.

CodeArtisan · on Aug 10, 2020

I pray daily that more of my fellow programmers may find the means of freeing themselves from the curse of compatibility. -- Edsger Dijkstra - 1972, Turing award lecture

quickthrower2 · on Aug 11, 2020

The “trade off” not “curse” imho.

But what’s the point he is making. Is there a link to that lecture?

flingo · on Aug 11, 2020

> And this is how backwards compatibility comes to kill innovation.

With all of the novel and esoteric programming languages that exist, I wonder why hasn't there been a "I can't believe it's not c/c++" language that breaks these things, but isn't taken seriously enough to diverge completely from the ISO language standards. (for bonus points, with standardized gcc extensions, and something like embedded asm but for compiled languages (like iso standard c))

ksec · on Aug 10, 2020

>(cough Rust cough)

Or Zig? What other potential C replacement are there?

mhh__ · on Aug 10, 2020

D's betterC mode is very good. It's slightly above C in abstraction (i.e you still have generics) but you get all the metaprogramming too (no more macros!)

virtue3 · on Aug 10, 2020

Nothing like a good ole:

#define MULTIPLY(a,b) a*b

To ruin your month.

blovescoffee · on Aug 10, 2020

Is the problem the use of a macro, which isn't as robust as a template, or is there a problem with the macro? All I can figure is there could be an accidental dereference instead of multiply.

virtue3 · on Aug 10, 2020

The problem is macro expansion is not (potentially) what you think it is:

MULTIPLY(2+3,4+5) that expands in

2+34+5 (and not into: (2+3)(4+5)).

To have the latter, you should define:

#define MULTIPLY(a,b) ((a)*(b))

https://stackoverflow.com/questions/14041453/why-are-preproc....

favorited · on Aug 10, 2020

But it's so nice and composable! /s

    #define DEREFERENCE(b) MULTIPLY(=, b)

    int thing DEREFERENCE(ptr);

virtue3 · on Aug 10, 2020

You sir, are a true monster. I salute you.

schemy · on Aug 10, 2020

Rust is the future and always will be.

csours · on Aug 10, 2020

C is a knife. You expect knives to cut you, so you handle them carefully.

Except that C is sometimes a knife with another knife hidden in the grip and it you don't handle it just right, the hidden knife will also cut you. (Thinking of libraries/other people's code)

craftinator · on Aug 10, 2020

Lol I love this analogy. It's pretty much like Darth Maul's lightsaber. Yeah, it'll cut through anything; even stuff you aren't looking at

virtue3 · on Aug 10, 2020

Just like with C code, you better be force(memory) sensitive to even think of wielding the space wizard laser sword.

csours · on Aug 10, 2020

Ah, pointers, not as clumsy as objects; an elegant weapon for a more civilized age.

(I know objects are actually just fancy pointers)

saagarjha · on Aug 10, 2020

Note that there are materials that resist lightsaber blades: https://starwars.fandom.com/wiki/Lightsaber/Legends#Lightsab...

Filligree · on Aug 11, 2020

And in this analogy, I'm the guy who just wants to cut his beef and doesn't want a kitchen built from exotic materials.

saagarjha · on Aug 11, 2020

You don't want to sear your steak while you cut it?!

pjmlp · on Aug 10, 2020

And if you make too much pressure on the handle, small spikes will be thrown in all directions.

bigdict · on Aug 10, 2020

  int main (int argc, char* argv[]) {
  
      (void)argc;
      (void)argv;

      struct Meow* p_cat = (struct Meow*)malloc(sizeof(struct Meow));
      struct Bark* p_dog = p_cat;
      // :3
 
      return 0;
  }

Why declare main that way if you are going to discard the arguments?

Why cast the malloc? This isn't C++.

scruple · on Aug 10, 2020

I'd like to know, too. Have never seen either in practice.

flohofwoe · on Aug 10, 2020

The "pointless" cast is required for libraries that are written in the common subset of C and C++ so that they can be compiled both in a C or C++ compiler.

Some people prefer to compile C code through a C++ compiler for various reasons, Microsoft has even been recommending this because their C++ compiler isn't quite as terribly outdated as their C compiler:

https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...

(not that I agree with the reasons, but for a library it makes sense to not lock out people who prefer to compile C libraries as C++ when integrated into a C++ project).

scruple · on Aug 10, 2020

Thank you! That's interesting and it makes sense.

rootbear · on Aug 10, 2020

This is what you get when a person who is mostly a C++ coder (as per his bio) writes C. I cringe when I see things like

  struct Bark* p_dog = p_cat;

instead of

  struct Bark *p_dog = p_cat;

That weird affectation of C++ programmers putting the asterisk on the type and not on the declarator, where it belongs, makes my eyes bleed. I somewhat understand the reasoning, but I think it's a gross violation of the Law of Least Astonishment.

bigdict · on Aug 10, 2020

I never understood the reasoning behind this, is it that "the asterisk is part of the type, so we group it that way"?

That misses the point of the C declaration syntax: you write an expression that when used on its own will recover the basic type. So the asterisk goes with the symbol name, because that's how you dereference a pointer.

Further, it doesn't work if you want to declare more than one pointer like so:

  int* a, b;  /* wrong */
  int *a, *b; /* correct */

rootbear · on Aug 10, 2020

I think the issue is that programmers want a type "pointer to int", for example, but C doesn't directly provide that type. It has a type, int, with a modifier (asterisk) that can be applied to a declarator to make it a pointer to that type.

One way to create a pointer type in C would be to declare it using typedef:

    typedef int * int_p;

then one can write:

    int_p p, q, r;

and declare three pointers to int with perfect clarity, whereas using the C++ style, we'd get:

    int* p, *q, *r;

which is very confusing, or,

    int* p;
    int* q;
    int* r;

which is very verbose. I honestly don't know how C++ programmers typically handle this situation.

I have seen some code that strikes a middle ground:

    int * p;

which is a little more clear, but doesn't address the multiple declarator situation.

So why don't C++ programmers use typedef? I don't know, other than I understand Stroustrup doesn't like it (not without reason).

(Edited for formatting and minor clarity corrections.)

implicit · on Aug 10, 2020

Most modern C++ code that I've seen restricts code to declaring a single variable per statement.

It's not really a big deal because things are also always introduced at the latest possible position. Each is also typically given an initializer. I'd consider it suspicious if I were to see C++ that declared 3 uninitialized pointers back to back like this.

And then you get to the codebases where the authors have chosen to embrace auto and type inference... :)

optymizer · on Aug 10, 2020

"pointer to int" is a type, because 'pointer to' is not a type qualifier like 'const'. You can call it a modifier but you'll quickly start making exceptions for any type that has a modifier to explain away why it's actually a different type with a different size.

For example, sizeof(x) takes a type. The type argument for sizeof(int) is different than sizeof(int* ) and the results are different. sizeof(int*[3]) is different as well. These are all different types where pointers change the type. It's not the same type with a pointer modifier, there is no such thing.

0xffff2 · on Aug 10, 2020

I handle it by outlawing multiple declaration on a single line. It may be more verbose, but it's much easier to read IMO. This has been a somewhat common rule in my experience.

kazinator · on Aug 10, 2020

This is not "C++ style"; it's just a poorly considered style perpetrated by coders who are not familiar with the grammar.

The C++ syntax is the same as C in this regard: a declaration has specifiers, and then one or more declarators.

The exception are function parameters, where you have (at most) one declarator.

> why don't C++ programmers use typedef?

C++ programmers do use typedef. For instance:

  typedef std::map<from_this_type, to_this_type> from_to_map;

C++ programmers probably use typedef a bit less than they used to, because of features like auto.

When a C++ class/struct is declared, its name is introduced into the scope as a type name. Therefore, this C idiom is not required in C++:

  typedef struct foo { int x } foo;

that cuts down some typedefs. If you used a typedef for a C++ class that isn't just a "POD", you have issues, because the typedef name doesn't serve as an alias in all circumstances.

  typedef class x { x(); } y;

  y::y() // cannot write x constructor this way
  {
  }

Y_Y · on Aug 10, 2020

It's worth noting that this nice C logic falls apart when you do something like

    f(int &a);

to mean "by reference" instead of what it should be, which is "get the address of a, and that will be an int" which is , of course, nonsense.

kazinator · on Aug 10, 2020

I don't follow. The above is not C. It's a C++ extension over C declaration syntax in such a way that the & is part of the declarator just like * .

  // Inexcusable trompe l'oeil:
  int& a, b;

  // OK;
  int &a, &b;

Here, the mistake may be harder to catch, because the expressions a and b are both of type int, either way.

  // Intent: b is an alias of a.
  // Reality: b is a new variable, holding copy of x.

  int& a = x, b = a;

I think what you mean is that the "declaration follows use" principle falls apart for C++ references.

That is necessarily true because no operator is required at all to use a C++ reference, whereas the explicit & type construction operator is required in the declarator syntax to denote it.

However, it has little to do with the issue that & is part of the declarator and not of the type specifiers.

Declaration follows use also falls apart for function pointers in C, because while int (* pf)(int) can be used as result = (* pf)(arg), it is usually just used as result = pf(arg).

Declaration follows use also falls apart for the -> notation. A pointer ptr is always being used as ptr->memb, but declared as struct foo *ptr which looks nothing like it.

And of course, arrays can be used via pointer syntax, and pointers via array syntax, also breaking declaration follows use.

Declaration follows use is only a weak principle used to help newbies get over some hurdles in C declaration syntax.

UncleMeat · on Aug 10, 2020

> I honestly don't know how C++ programmers typically handle this situation.

The verbose one. A few extra lines rarely matters. I think the number of times this has come up in my code base is very very small, maybe a few dozen extra lines across hundreds of thousands.

sacado2 · on Aug 11, 2020

A C++ function declaring several uninitialized pointers one after the other is very suspicious anyway. It looks like the programmer is trying to write old-school (pre C99) C code (declaring all variables at the top of the function) in C++.

Typedeffing pointers, especially for the sole purpose of "being less verbose" when declaring uninitialized pointers, is a red flag too.

0xffff2 · on Aug 10, 2020

I'm a C++ guy who prefers `int*`. I also hate multiple declarations on a single line so I just outlaw that entirely so I never have an issue with declaration correctness.

ramzyo · on Aug 10, 2020

Three cheers for pointing out that this boils down to preference. In years programming C in teams and projects of various sizes and criticality, I've never come across a situation where the asterisk position actually made a difference, other than to incite grumbles from those who prefer it whatever which way. I sincerely look forward to a situation where it does make a difference to the maintainability or correctness of code in a way that can't be addressed with other language syntax, or impedes a team's ability to deliver reliable production software against whatever style guide or coding standard they've adopted.

Ragnarork · on Aug 10, 2020

I'm in the same boat, and I'd like to add that I agree that using `int* p, * q, * r;` is not pretty, and could possibly lead to ending up mistakenly with `int* p, q, r;`.

But we're in 2020 and we've learned to avoid to declaring stuff without initializing it at the same time to avoid the stupid mistake of using something uninitialized.

Interestingly enough, testing with clang shows that uninitialized variables get their warning, but uninitialized raw pointers don't.

secondcoming · on Aug 10, 2020

> I never understood the reasoning behind this, is it that "the asterisk is part of the type, so we group it that way"?

Yes. It just looks better IMO. Now, east-const vs west-const?

MarekKnapek · on Aug 10, 2020

    int* a, b;  /* wrong */
    int *a, *b; /* also wrong */


    /* correct */
    int* a;
    int* b;

drummer · on Aug 10, 2020

    int* a, b;  /* wrong */
    int *a, *b; /* also wrong */


    /* correct */
    int* a;
    int* b;

    /* moar correct */
    int* a{nullptr};
    int* b{nullptr};

CodeArtisan · on Aug 10, 2020

The choice between ``int p;'' and ``int p;'' is not about right and wrong, but about style and emphasis. C emphasized expressions; declarations were often considered little more than a necessary evil. C++, on the other hand, has a heavy emphasis on types.

A ``typical C programmer'' writes ``int p;'' and explains it ``p is what is the int'' emphasizing syntax, and may point to the C (and C++) declaration grammar to argue for the correctness of the style. Indeed, the binds to the name p in the grammar.

A ``typical C++ programmer'' writes ``int p;'' and explains it ``p is a pointer to an int'' emphasizing type. Indeed the type of p is int. I clearly prefer that emphasis and see it as important for using the more advanced parts of C++ well.

https://www.stroustrup.com/bs_faq2.html#whitespace

saagarjha · on Aug 10, 2020

Hacker News has made your comment fairly difficult to understand ;)

Y_Y · on Aug 10, 2020

This should be mandatory reading for everyone who ever writes code I have too look at or debug.

_wldu · on Aug 10, 2020

I would not cringe at this. It's just his preference. They have the exact same meaning. Other than personally preferring one way or the other for readability, there is no difference.

lokedhs · on Aug 10, 2020

The problem is that if they forget to include the correct header file that declares malloc, then the cast will hide any warnings.

If the cast is not there, it will tell you that there is a type mismatch.

rootbear · on Aug 10, 2020

I would assert that usability and clarity are very much a difference.

mbreedlove · on Aug 10, 2020

I've always struggled with this, I usually use the first style. To me, Bark and Bark* are both types, while the name p_dog is just a name regardless of what it's referring to.

I read the first as 'a struct of type Bark pointer named p_dog'.

How do you read the second example in your head?

bigdict · on Aug 10, 2020

https://eigenstate.org/notes/c-decl

rootbear · on Aug 10, 2020

Pointer to type "struct Bark".

bschwindHN · on Aug 10, 2020

I like to use clang-format so I don't have to even think about this, but my real solution is to use Rust when I can, which is pretty much always these days.

jaydem · on Aug 10, 2020

Why is that? I learned C++ first, but spend most of my time now using C. It makes more sense to me that the type should be one thing (struct Bark*) and the identifier a different thing (p_dog).

Also, a take from Stroustrup since I found it. https://www.stroustrup.com/bs_faq2.html#whitespace

MaxBarraclough · on Aug 10, 2020

The problem is that it doesn't reflect how the C language really works. More specifically, it is misleading in multiple declarations, see the sibling reply at https://news.ycombinator.com/item?id=24109267

Ragnarork · on Aug 10, 2020

One could argue it's wrong to do multiple declarations like that, and the whole argument collapses to mostly a matter of preference.

MaxBarraclough · on Aug 10, 2020

Even if your coding style prohibits multiple declarations, you still don't escape the fact that in C, the asterisk binds to the variable identifier, not to the type. It crops up again in the function pointer syntax:

    void *(*foo)(int *);

foo is a pointer to a function which accepts a pointer-to-int and returns a pointer-to-void.

( Taken from https://www.cprogramming.com/tutorial/function-pointers.html )

tgb · on Aug 10, 2020

Thanks, this is the best example I've seen.

saagarjha · on Aug 10, 2020

You can call a feature of a language misguided, but you can't call it wrong; that's just how C does identifiers. Sure, in your "cut out the parts of C that don't fit into my model" you can make it work, but that's not C.

bigdict · on Aug 10, 2020

https://eigenstate.org/notes/c-decl

zzo38computer · on Aug 10, 2020

I think the bad part of C is the confusing syntax for types. In a declaration of the initial value, you are assigning the value of "p_dog" and not of "*p_dog". I tend to omit the spaces on both sides of the asterisk, though, and don't declare multiple pointer variables in the same line; I will use separate lines if the type isn't something as simple as "int" or "unsigned char". If I need especially complex types, then I will use typedef.

optymizer · on Aug 10, 2020

I find your comment inflammatory and your zealousness amusing, but I'll bite and try to be reasonable by assuming you're willing to entertain another point of view.

The whole "declaration follows usage" is just a bad tradeoff. It makes it easier to parse expressions. That's my understanding of why they did it. It makes it _objectively_ harder to read, because some declarations follow this easy pattern of "name on the right and type on the left", while for some other declarations you have to employ the spiral reading pattern (e.g. for function pointers, arrays, with variable declarations being the easiest one).

You know what's easier that all that? Type on the left, name on the right.

  int a;
  int* a;
  int[] a;
  unsigned int(int) f;

Notice that you can probably easily guess what that last declaration represents, without having to consult a wise old man. I think you simply got used to how C does it, so now the actual sane way is weird to you personally, but you have to recognize that it's one additional thing _everyone_ has to learn because it's counter-intuitive. Hundreds of thousands of developers had to learn some weird spiral reading rule because 1 compiler writer found it easier to reuse a yacc rule.

That's why Java, Go and D changed this nonsense. Java supports both "int[] a" vs "int a[]". They support both to appease everyone, but they went the extra mile to support "int[] a". D changed it to "int[] a" and called it a day and Go introduced this novel [5]int syntax, which is different, but clearly easy to read ("array of five int").

Again, type on the left, name on the right (or in Go's case, name on the left, type on the right - but at least it's not a mixed bag). Once you see it that way (i.e. you give up on declaration follows usage rule), it's not weird, it's not wrong, it's intuitive, and it makes the language easier to learn and use.

I thought this is a matter of preference until I actually wrote a C compiler for fun and have permanently solidified my opinion on this issue.

The asterisk is part of the type, it's not just some random symbol, it's not a type qualifier like 'const' or 'volatile'. It's a type token that builds a distinct type, e.g. "int * * " is a type that spells "pointer to pointer to int", it's not an 'int' type with some flags attached to it.

Let's take the concrete example of 'struct Bark* p_dog'.

A typical compiler will tokenize 'struct', 'Bark', '* ' and 'p_dog' and will group them as ('struct', 'Bark', '* ') to derive the type "pointer to struct Bark" and ('p_dog') to derive the name of the symbol when it adds an entry into its symbol table, in other words - the compiler itself splits that line so that types are the left, and names on the right.

rootbear · on Aug 10, 2020

It wasn't my intention to be inflammatory and I don't consider myself a zealot. My comment has generated a lot of interesting discussion and that's a good thing, and probably worth the down votes.

I wrote C for a long time before I ever even heard of the spiral rule. I do think that the "declaration follows usage" idea worked a lot better before C declarations became so complex. I'm not claiming that the C declaration syntax is wonderful, it isn't. I just think that the C++ style of pretending that int* is a pointer type is misleading, since that's not really what's happening grammatically. Yes, * is a type token, as you say, but it modifies the identifier, not the type specifier. But since the style now is to only declare one variable per declaration, and to always initialize it, then in practice it isn't actually all that confusing.

I have always wanted to write a compiler and I'm sure I would get a new perspective on these matters if I did. Go has done a lot to clean up C's messes. I haven't used it much, but I'd like to know it better.

bigdict · on Aug 10, 2020

Agree 100%. It is a tradeoff that detracts from ease of human interpretation.

Because C makes that tradeoff,

  int *a;

is idiomatic C and

  int* a;

isn't.

For C programming to be pleasant, you have to understand and agree with the philosophy at least while you're writing C.

Go developers changed this in a particularly elegant way. Here's what Rob Pike has to say: https://blog.golang.org/declaration-syntax.

I too gained a better understanding of the problem after going through a compiler-writing exercise.

tobyhinloopen · on Aug 10, 2020

Why put the astrisk on the variable though? It’s part of its type, isn’t?

stephencanon · on Aug 10, 2020

Because in the C (and C++) grammar,

    int *a, b;

declares a pointer-to-int `a` and an int `b`, rather than two pointers-to-int. The better solution to this problem is "don't do that", but C (and C++) programmers have a fetish for terseness.

secondcoming · on Aug 10, 2020

> but C (and C++) programmers have a fetish for terseness.

Indeed, Linus Torvalds has a recent rant about people still adhering to max 80-column width code. It's pointless in the day of massive monitors.

qppo · on Aug 10, 2020

I use a rather large 4K monitor as my daily driver and maintain 80 column width code where possible, to avoid wraparound when I have many windows open.

It's not pointless unless you're using your massive monitor like you did your tiny one thirty years ago. I use mine like a bunch of tiny monitors, not one big one.

bigdict · on Aug 10, 2020

https://eigenstate.org/notes/c-decl

account42 · on Aug 10, 2020

  struct Bark * p_dog = p_cat;

staycoolboy · on Aug 10, 2020

This below my mind when I started using clang-format. There are only two options in all of clang-formats 100+ customization variables: move the star next to the type, or don't touch the star.

ho_schi · on Aug 10, 2020

Backwards compatibility for code is important, like progress in language evolution. I have question regarding "C has no ABI that could be affected by this, C doesn’t even respect qualifiers, how are we breaking things?!"

We have language standards for changes like this like '-std=c2x' or '-std=c89' with GNU's GCC. I understand and accept the matter of avoiding breakage. Furthermore C is inherently weakly typed, contrary to C++ which is strongly typed. Something you probably should not change, because that are basic language features. But the option to set the language standard does exist for this situation, to allow changes which will affect users. So why it cannot be used here?

That is not a critic. I'm sure that have their rationale for that and know more than me.

PS: Some changes will break the ABI, in that cases we likely see a PREPROCESSOR variable or something like that which is more complicated. The GCC people used it for some changes to std::string if I remember correctly.

godshatter · on Aug 10, 2020

I was wondering this myself. Those million line codebases where they are worried about new warnings breaking things should be using a compiler already that doesn't know about the new standard, or one that allows you to set which standard their code is following. I guess I don't understand the problem. Does something like MISRA assume you are using the latest standard? I understand there are regulations involved.

saagarjha · on Aug 10, 2020

It's also wrong, newer standards do change the signatures of functions–a lot of them got noreturn added in C11.

loriverkutya · on Aug 10, 2020

"We will not make it easier for new programmers to write better C code." - well, that escalated quickly

Upvoter33 · on Aug 10, 2020

one should never get in the way of a good rant.

more seriously, what I'd like to see in C (as a long-time programmer in C) is less freedom around undefined behavior. I used to feel like the biggest mistakes made in C were around pointer bugs, but you can be careful and get things like that (mostly) right. Undefined code is a lot harder to see and avoid without a very deep understanding of lots of small details.

wool_gather · on Aug 10, 2020

I assume you've seen John Regehr's "Proposal for a Friendly Dialect of C"? https://blog.regehr.org/archives/1180

The basic idea is to replace various common UB scenarios with "defined, but unspecified", in order to move the compiler's interpretation of the program closer to the programmer's.

wtetzner · on Aug 10, 2020

It would be very nice if there was a special C compiler that could just warn about all occurrences of undefined behavior in a program. It doesn't even need to be able to generate code, it could just be a front-end that points out the places where undefined behavior is either being invoked, or could be invoked depending on the input to the program.

MauranKilom · on Aug 10, 2020

Does this piece of code have UB? Could it invoke UB depending on the input to the program?

    void count(int x)
    {
      for (int i = 0; i < x; ++i)
        printf("%d ", i);
    }

The answer to the latter question is of course "yes" - signed integer overflow is UB, so you invoke UB by passing a negative x.

Would you like every loop to be flagged as potential UB? I don't think you'd last a single day programming in that C dialect.

wtetzner · on Aug 13, 2020

You wouldn't use the compiler to build you programs, you'd just use it to detect UB.

It would be especially useful if you could specify your entry point(s), and let it find all cases where user inputs could cause UB.

I think it would be especially helpful for checking the output of compile-to-c languages.

jedisct1 · on Aug 10, 2020

Take a look at Zig. https://ziglang.org/

It fixes most of the C mistakes, while still giving programmers tight control over the system.

kps · on Aug 10, 2020

The real problem with n2526 is that it proposes fixing the return types of locale functions, rather than deprecating that hot mess entirely.

da39a3ee · on Aug 10, 2020

Please tell me if I misunderstood but this is what I thought I was reading here:

- The author is someone quite young (undergrad age) who is serving on a C language committee (that I assume is mostly made up of people who are over 40, probably mostly over 50).

- The author not only is donating his own time to C language committee work, but also clearly knows what he's talking about regarding C.

The article came across to me as thinly-disguised frustration/anger that the committee had no interest in making C "safer".

My take away was that the article very much fitted in with all the articles one sees being positive about Rust not just being of academic / hobbyist interest but being a serious contender for a replacement in many industry contexts.

app4soft · on Aug 10, 2020

> C will never stop you from making mistakes

As making mistakes is nature of human, and C will never stop human from making mistakes, then C will preserve human be natural forever!

torh · on Aug 10, 2020

Well, I liked the part about (* vague gesturing towards outside *)

staticassertion · on Aug 10, 2020

What does it mean to be the 'Project Editor' of C?

steveklabnik · on Aug 10, 2020

My understanding of the role is that it's their job to literally edit the standard, that is, they take the papers that have been accepted, and apply them to the standard's text to produce the next draft of the standard.

ddevault · on Aug 10, 2020

Be advised that every new C project written by experienced developers begins with `-Wall -Wextra -Wpedantic -Werror`

account42 · on Aug 10, 2020

-Werror is a bad idea for open source projects or anything that will be compiled by people who do not know how to fix things when their shiny new compiler added a fancy warning.

-Werror=... for specific warnings might be OK in some cases.

ddevault · on Aug 10, 2020

They can just remove -Werror when compiling. It's useful enough to keep in place by default if you consider warnings bugs (and you ought to).

saagarjha · on Aug 10, 2020

So would you agree that it might be a useful thing to use for local development, but not for code you want other people to compile?

nly · on Aug 10, 2020

That still doesn't turn on many warnings. Clang has -Weverything for this.

gnulinux · on Aug 10, 2020

-Weverything contains contradictory errors. I think -Wmost is better but it's probably still very noisy.

panpanna · on Aug 10, 2020

Actually, "C will never stop you from doing <things>".

choeger · on Aug 10, 2020

Sounds like a bad case of the tail wagging the dog to me.

userbinator · on Aug 10, 2020

...and that's a good thing.

Console homebrew. iOS jailbreaking. Android rooting. Those are only some of the freedom-enabling things this and other "insecurity" allows. It's not all bad --- and IMHO it's necessary have these "small cracks", as it keeps the balance of power from going too far in the direction of the increasingly authoritarian corporations.

I always keep this quote in mind: "Freedom is not worth having if it does not include the freedom to make mistakes."

ameliaquining · on Aug 10, 2020

This has nothing to do with any of that. Absolutely nobody is proposing that it shouldn't be possible to write code that reads from and writes to arbitrary registers and memory addresses, even though this obviously makes complete memory safety impossible. (I mean, there are legitimate use cases for sandboxing and VMs and what have you without escape hatches, but there are also legitimate use cases for not-that.)

This is about providing better compiler diagnostics. Such diagnostics can't catch every mistake, as long as we require the aforementioned ability to perform arbitrary operations, but they can catch a lot more mistakes than they're catching now.

TonyTrapp · on Aug 10, 2020

Trust me, none of the memory errors in those examples come from the omission of a `const` in a pointer. It's not like adding `const` to a pointer will write-protect that memory so that magically exploits become impossible. All those insecurities that lead to exploits are buffer overruns of some sort, and nothing would change about them with this sort of little language change.

brazzy · on Aug 10, 2020

Oh boy. The conclusion sounds pretty scathing.

pm24601 · on Aug 10, 2020

Yet more worshipping on the fear of change. Simply speaking. If the code is suspicious, it should be treated as such.

I am not tied to some definition of perfection but rather the practical. Developer tools help write safer code. If that code is running dangerous equipment this is even more important.

The expectations and quality needs to be raised. ESPECIALLY in operating systems, device drivers and yes code that has been running "just fine" for years.

dooglius · on Aug 10, 2020

This is a dumb argument. The proposal was to add a new kind undefined behavior that breaks a bunch of existing code, and this is spun as not helping?!

kevincox · on Aug 10, 2020

That's not what the article claims. It claims that it was only adding a warning.

MauranKilom · on Aug 10, 2020

The code already has UB if it does not respect the "const in spirit"-ness of what those functions return. The warnings would be introduced to prevent new code from triggering that foot gun.

bionhoward · on Aug 10, 2020

sounds like a stealth argument for rust?

quelsolaar · on Aug 10, 2020

If the C standard org was serious about code safety the FIRST thing they would do is PUBLISH THE STANDARD!!!!

Complaining that people don't follow the finer details of the standard while at the same time keeping the standard unavailable to the vast majority of C programmers is a travesty. When compiler writers think that its ok, to put "optimizations" in to compilers that remove vital NULL checks, because the the spec says that something may be UB that make no sense, I think to myself, What would they say If they downloaded the latest version of their favorite text editor, only to find that it would format their system disk when ever the user presses Control? When they then reached out to the maker of the text editor, the developers would answer: "Oh, on page 204 in the documentation that costs money to access it says that pressing Control, is undefined, we we are in our rights to format your system disk". Would they be ok with that and think it was fair? Thats how the C standard body is behaving!

NOBODY learns C form the C standard, and that is your fault, so don't complain about people not following it. The fact that you are also fucking it up doesn't help: (https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/)

dang · on Aug 10, 2020

Please don't use uppercase for emphasis. If you want to emphasize a word or phrase, put asterisks around it and it will get italicized.

https://news.ycombinator.com/newsguidelines.html.

lqet · on Aug 10, 2020

A comment on your blog post. You are writing:

> In C89, undefined behavior is interpreted as, “The C standard doesn’t have requirements for the behavior, so you must define what the behavior is in your implementation, and there are a few permissible options”.

But isn't the sentence

> Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results

in the C89 spec already giving the compiler developers carte blanche to do whatever they want? Anything can be interpreted as "ignoring the situation with unpredictable results".

In your criticism of C99 (where the word "permissible" was changed to "possible") you write:

> In C99 undefined behavior is interpreted as, “The C standard doesn’t have requirements for the behavior, so you can do what ever you want”. [..]. If for instance you have a large codebase like say the Linux Kernel and there is a single instance of undefined behavior somewhere in there the compiler is free to produce a binary that does what ever it wants. It doesn’t have to document what it does, it doesn’t have tell the user, it doesn’t need to do anything.

This exactly fits the C89 spec. The compiler may just ignore that you are relying on undefined behavior and produce a completely unpredictable binary.

quelsolaar · on Aug 10, 2020

If the change makes no difference at all, why do you think the change was made? Changes in specs are made because people lobby for them.

klodolph · on Aug 10, 2020

I think the relevant phrase is "so you must define what the behavior is in your implementation".

With simple compilers, you could more or less define what happens with undefined behavior. For example, on platform X, if you dereference NULL, you get a segfault. On platform Y, you get a zero value.

Problem is these behaviors are hard to define once you start thinking about the edge cases. You can dereference NULL with an offset, and once you do that, you might skip over any guard pages and overwrite a function pointer with garbage and then jump to some random location in memory, and once you do that, all bets are off. I mean, really, really off.

So that leaves you with three options.

1. Abandon the idea of defining behavior in these cases.

2. Insert bounds checks.

3. Refine the idea of undefined behavior into two separate concepts--"bounded" and "critical" undefined behavior.

C has taken all three options at the same time, in a sense. The main standard takes option #1, the Address Sanitizer gives you #2 (more or less), and annex L gives you #3.

Annex L, option #3 above, is in many senses the sane option--but it requires a fairly large amount of work on the part of the compiler writers, and to my knowledge neither GCC nor Clang have implemented it.

I would also like to add that some of the places where we see undefined behavior were inserted into the standard for real-world cases for unusual platforms. For example, the weird rules for pointer comparisons make sense when you think about segmented memory. Relational comparisons (<, >, <=, >=, but NOT == or !=) on pointers are generally used for traversing arrays, and if you have two pointers into the same array, it would be reasonable to assume that they have the same segment (depending on your memory model)... SO, you omit the segment comparison in this case.

quelsolaar · on Aug 10, 2020

I think there are valid reasons for a language spec to have Undefined behavior. But many UB in C would be much better if the where defined as platform specific. Like integer overflow, its very well defined on x86, ARM and every other CPU architecture in use, but not only does compilers act as if its unknowable, even when they clearly know it because they are implementing a known instruction set, they even make it hard to detect overflows since they optimize away tests since they cant happen.

MauranKilom · on Aug 10, 2020

Surely you understand why compilers do this, right? This isn't compilers and standard writers being mean because they hate you. This is a language that you choose for speed being optimal for speed. In particular, signed integer overflow being UB was and still is a necessary evil so that compilers don't have to add ugly loop preambles to every loop you write.

klodolph · on Aug 10, 2020

I’d say that Annex L does get you closest to that goal, in that “bounded undefined behavior” can result in indeterminate values.

> …not only does compilers act as if its unknowable…

I disagree with this interpretation. The compiler is acting not as if it’s unknowable, but as if it doesn’t happen.

aw1621107 · on Aug 10, 2020

> The fact that you are also fucking it up doesn't help: (https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/)

For the benefit of readers here who may not know, this blog post has been discussed previously on HN [0].

Personally, I don't find the argument made in the blog post convincing [0], but I'm hardly an authority on this kind of thing. I've copied part of my comment on the other thread here for reference.

In any case, I think there's an argument to be made that it's not the standards committee that is at fault.

----

I'm not really convinced the author is correct in claiming that a one-word change opened the floodgates to optimizations on undefined behavior. In particular I think:

> Careful reading will reveal that the word “Permissible” has been exchanged to “Possible”. In my opinion this change has lead C to go in a very problematic direction.

is a red herring. In my opinion, the actual problematic phrase is this:

> ignoring the situation completely with unpredictable results

which didn't change between C89 and C99.

It all comes down to what "ignoring the situation" should mean. Compiler vendors appear to interpret this to mean "ignore situations that invoke undefined behavior". Programmers who dislike optimizations based on undefined behavior appear to interpret this to mean "ignore the violation that leads to undefined behavior and treat it like conforming code". Who's right? It's ambiguous.

----

(Edited after noticing that you're the author of the blog post. Sorry!)

[0]: https://news.ycombinator.com/item?id=22589657

[1]: https://news.ycombinator.com/item?id=22590286

quelsolaar · on Aug 10, 2020

I agree that "ignoring the situation completely with unpredictable results" is also bad, although I might be able to see some situations where it would be difficult to write a spec in a clear way that was more precise.

However the problem isn't just that compilers ignore UB, they actively make use of it! In the example:

if(p == NULL) write_out_error_message_and_exit(0); *p = 0;

The NULL check is removed by the compiler because of the undefined behavior of writing to NULL. Instead of ignoring that the code might write to NULL, the compiler does the opposite, and assumed that p cant be NULL.

aw1621107 · on Aug 10, 2020

> However the problem isn't just that compilers ignore UB, they actively make use of it!

I think this might depend on how you interpret the standard and/or compiler's actions.

If you believe "ignoring the situation completely" means "ignore precisely those statements that invoke UB and leave everything else intact", then the transformations done by compilers can look like actively taking advantage of UB beyond what might be implied by the standard.

If you think "ignoring the situation completely" allows "inspect the set of program executions, discard those in which UB is invoked, and optimize based on the remaining possible executions", then the mere act of "ignoring UB" can't really be distinguished from "actively making use of UB"; the actions are one and the same.

saagarjha · on Aug 10, 2020

That code should be fine, the problem arises when the dereference comes first and/or control flow doesn’t leave before the dereference.

quelsolaar · on Aug 10, 2020

It should be fine, but it isnt. The Linux kernel has had problems with this kind of code, because the flow control indicates that the dereferencing will happen irregardless of the if statement. Its order independent. If the function had a "noreturn" qualifier, or an "else" that could tell the compiler that hitting the if statement will mean that the dereference never happens, then the compiler would not break this code.

I dont want to sound like I'm showing off of or putting down your comment. Your comment is 100% reasonable. Its an example of how the spec is not reasonable, and it doesn't do what any reasonable person would expect.

saagarjha · on Aug 10, 2020

No, I meant what I said. The famous kernel “null pointer check removal” bug was in code of the form

  void foo(int *bar) {
      int baz = *bar;
      if (!bar) {
          return;
      }
      /* do stuff with baz */
  }

which you can see provably invokes undefined behavior if bar is NULL. In the case you provided where the dereference comes after a function call the compiler cannot optimize out the check unless it can prove that control flow returns, which you claim it does not (it exits). noreturn and such are there to improve these cases and make them explicit to the compiler, but in the absence of information it must be conservative as there are many ways for a function call to never return, one of which is exit(3). Optimizing this case would be incorrect (i.e. a compiler bug) and this happens to be one of the reasons why function calls often serve as a barrier to optimizations in C.

MauranKilom · on Aug 10, 2020

> If the function had a "noreturn" qualifier, or an "else" that could tell the compiler that hitting the if statement will mean that the dereference never happens, then the compiler would not break this code.

If the compiler can't prove that the function returns, it's not allowed to assume that it does.

I challenge you to cook up an example on godbolt.org that shows the behavior you claim. But take note of the difference between

https://godbolt.org/z/sWYEYc

and

https://godbolt.org/z/3E7hbz

steveklabnik · on Aug 10, 2020

Isn't this situation due to ISO rules, and don't they publish a "draft" that is identical to the actual standard before hitting publish, to get around those rules?

Or maybe you're talking about something I don't understand.

quelsolaar · on Aug 10, 2020

Yes, but this arcane way of working isn't OK, when the entire opensource world depends on the language to produce functioning code.

steveklabnik · on Aug 10, 2020

While I also prefer other processes, for many people, ISO standardization is really important, and so to me it seems like they're doing the best they can, given that constraint.

thayne · on Aug 10, 2020

Maybe those ISO rules should change....

Y_Y · on Aug 10, 2020

You can get the standard here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf

(more or less)

fanf2 · on Aug 10, 2020

That is an intermediate draft between C18 and C2x with diff markers, so it isn't the best reference to use. My notes have:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/

n1256.pdf - C99 including technical corrigenda up to TC3

n1570.pdf - C11 final draft

n2176.pdf - C18 final draft (password protected; C18 is basically C11 with corrections)

n2478.pdf - latest C2x draft (now typeset by LaTeX instead of troff)

wtetzner · on Aug 10, 2020

> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf

That link seems to be broken.

Y_Y · on Aug 11, 2020

I tested it before posting and also just now, and both times it downloaded the pdf correctly.

Also see your sibling comment for other version numbers that may be more useful.

wtetzner · on Aug 13, 2020

It's working for me now. Maybe it was just down for a short period. I wasn't getting a response from the server at the time.

moonchild · on Aug 11, 2020

The draft versions of the spec (functionally equivalent to the published versions) are freely available. I believe that the opensource compilers explicitly target those draft versions; but even if not, there's nothing in the release version that's not in the draft. You can even get fancy html formatted versions that somebody made - https://port70.net/~nsz/c/

fizixer · on Aug 10, 2020

[flagged]

wutbrodo · on Aug 10, 2020

A throwaway joke

mlthoughts2018 · on Aug 10, 2020

Neither will Haskell, Rust, etc. The extra classes of mistakes they can prevent at compile time just aren’t a meaningful volume of mistakes to make a practical difference in the lives of any developers apart from a few niche system engineering use cases.

If you think languages should facilitate type system design patterns that render large classes of application level mistakes impossible, you are just an architectural astronaut falling prey to premature abstraction and unaware that these languages aren’t making your application code safer or more reliable, only more brittle to the inevitable needs to break its core abstractions to solve expanding use cases.

ameliaquining · on Aug 10, 2020

This is a very strong claim. Do you have evidence for it?

(Yes, I'm aware that there've been a bunch of studies that didn't find decreased "bug density" in open source repos using statically typed languages, but there've also been studies that found the opposite, and in any event the methodology behind all of these is dubious. Example saga: https://hillelwayne.com/post/this-is-how-science-happens/)

mlthoughts2018 · on Aug 10, 2020

I disagree it’s a strong claim. I think the claim that strict functional programming or type system enforcement of safety has no data supporting it significantly improves anything (defect rate, speed of development, security, etc.).

The strong claims come from evangelists of those extreme programming paradigms. You should be asking them for proof that consists of more than anecdata.

It’s backwards to say that essentially what is a historically validated null hypothesis with 50 years of development history on its side is “a strong claim” that requires special evidence, while giving a free pass to all the people using little more than blog posts and slick syntax to claim these extreme design patterns are demonstrably better.

If they are so much better, where are all the companies getting free lunches just by switching to these tools? How is it that the entire industry is so irrational that so few companies are willing to switch?

Superior ways of working catch on very fast, just consider the radical adoption of GitHub and no-sql data systems. Why is strict functional programming not seeing that? What mental gymnastics does it require to take as a premise that strict functional programming is “better” yet adoption rates are super low and successes are not proved with data, only anecdotes?