> A lot of the coreutils development effort goes into testing, and new implementations should be able to leverage the existing GNU coreutils test suite, as it just uses whatever utilities are in the $PATH
This makes me really happy.
Separately from this, I've been wondering for a long time if there's a way for standards (and de facto standards) to share test suites for other implementers to re-use. Sort of a npm but only for test suites. Does such a thing exist? I wrote a TOML parser recently and had to re-derive the test suite from the specs.
I've started thinking about this last year and wrote down some thoughts in this draft post: https://ttm.sh/2l3.md
>
Could we get the best of both worlds by treating specification and compliance (testing) as a single problem? This hypothetical approach i call specification-driven development, whereby a specification document is intended both for human and machine consumption. In that case, the specification contains a written presentation of concepts, in addition to a machine-readable test suite that follows a certain format to programmatically ensure that the concepts and behavior described in the specification are implemented properly.
I've centered the document on my personal usecases (CLI and sysadmin checks) but i don't see a reason it couldn't be employed for API/ABI checks.
I've been thinking about that when working with recurrence rule libraries. It's a standard about calendaring and there's millions of edge-cases: developping your own library for prod purposes is a huge risk and most of the effort would probably be in tests. If all these projects shared a test suite, they could:
- Fix their own existing bugs
- Help other projects to fix their bugs
- Be explicit about what they support and what they don't
- Help writing new libraries (eg in a faster language, or for another ecosystem)
Indeed, I'm picturing sort of a canonical repository for test cases (with a common interface). I would love to have a place I can obtain a test suite for a given standard, execute it, and even potentially publish my conformance. Potentially even have a badge to post on GitHub repos indicating conformance to a certain version of the standard.
Public tests are good but relying on public tests only encourages kludges to make tools pass the tests. A form of overfiting. That's why some compression tests do not publish their test corpora.
But compression tests only have that issue because there are degrees of success for those tests, even for the same compression algorithm. I don't think they're hiding the test vectors because they're worried about gaming the tests by purposely failing to process valid inputs to achieve better metrics, just writing overly specific heuristics.
For straightforward corner case acceptance tests (which I would assume covers most of the coreutils test suite) there's not really a danger of overfitting unless the developers are literally writing if statements that match a single input from the test and provide the correct output.
That feels like an issue to be solved by the tests. If a program conforms to the public test suite the program "works" or the tests aren't covering the specification.
For compression tests it's a little different, since the problem is often underdefined even for a fixed algorithm and different implementations may produce encodings with different efficiencies (space, compression time, decompression time) for different inputs. Compression implementations can overfit on some inputs and produce subpar results on average even if they produce valid outputs for all inputs.
However I doubt this applies to coreutils' tests, which I suspect are more about conformance.
That makes sense if the public tests are "compress these bytes, expect this output" but I'd expect instead to have a lot of specific components with their own individual tests.
Do you think this applies if a corpus contains both affirmative and negative tests? As in, including not just conforming JSON but also a set of JSON that should be rejected due to non-conformance? I agree it could be challenging for instance for compression - where there is a more challenging definition of 'wrong.' I'm just wondering if this idea has legs, and appreciate your thoughts.
I do think that's a material risk - however broadly, do you think that such a scheme would make ecosystem better or worse? If you made a hypothetical JSON implementation in your language of choice, would you use it?
> I do think that's a material risk - however broadly, do you think that such a scheme would make ecosystem better or worse?
I don't think it would be easy to cheat if:
- tested implementation is open source: it would make cheating too obvious,
- tests are constantly updated: it would make cheating too cumbersome and
- tests include a randomization: it would not always work.
So, satisfying these points would drastically increase trust on the test corpus and tested program.
> If you made a hypothetical JSON implementation in your language of choice, would you use it?
On my machine? I use my own hacked kernel on my machine! In production? Only if tests indicate my implementation is as good as the best ones available.
This makes me really happy.
Separately from this, I've been wondering for a long time if there's a way for standards (and de facto standards) to share test suites for other implementers to re-use. Sort of a npm but only for test suites. Does such a thing exist? I wrote a TOML parser recently and had to re-derive the test suite from the specs.