More

D-Machine · 2026-04-28T05:35:27 1777354527

Except you can reject the very (stupid) question / framing, in which case, the response is to either close the tab, or respond in a particular response style, neither of which makes the data more informative. This kind of clumsy stuff is just dumb with what we know now, edutainment distraction for the HN crowd.

koliber · 2026-04-28T07:30:25 1777361425

There was a time when there were no separate names for blue and green in the Japanese language. Some languages right now have concepts of fundamental colors like navy blue and light blue, where English rolls it into a single "blue". Naming colors is highly cultural and changes over time. The idea that colors have boundaries is fascinating from both psychological as well as linguistic perspectives.

The framing seems stupid if you take the naive perspective that your language's way of dividing colors is the only valid one. Exercises like this and discussions that follow help expand perspectives.

D-Machine · 2026-04-28T05:32:39 1777354359

Yes, very annoying, we know from extensive work in psychometrics that single-item, binary / forced-choice items produce junk responses that are heavily contaminated with response styles (answer in most socially-desirable way, select closest response to mouse/finger, select same response as last time, select random response, etc). Just give people an out ("Diagree with the question / premises", "Prefer not to answer", "Unsure / Can't decide", etc) and make sure you have e.g. a 5-7 point Likert-type scale for multiple items, or up to an 11-point scale for single items.

This kind of site / demo does none of the above, and so can't even be trusted for directional effects (the direction of response may simple be due to the type of people responding, etc).

D-Machine · 2026-04-28T00:40:50 1777336850

This is the wrong way to do it, psychometrically, see here: https://news.ycombinator.com/item?id=47929056. You need to provide people gradations, or you get junk responses / abandonment, and your instrument doesn't measure what you think.

D-Machine · 2026-04-28T00:34:17 1777336457

Wrong way to do it. We know from psychometrics that forced binaries like this just create junk (people disagree with the question, so just choose a forced answer based on some heuristic for each such question like "closest to my mouse / finger" or "most socially desirable" or "same as last time"). So you aren't measuring what you think when you force choice like this.

If you're going to go with linguistic self-report and a single item, you really want something like an 11-point Likert scale. A smart design might get e.g. a person's rating of "blue-ness vs. green-ness" on an 11-point scale, then determine the optimal cutpoint via e.g. clustering, logistic regression, or some other method, to really get something meaningful.

tshaddox · 2026-04-28T01:56:43 1777341403

Is it really junk though? There are several comments in this thread like “people tell me I call stuff blue that they think is green and this quiz confirms that.”

D-Machine · 2026-04-28T04:47:59 1777351679

Forced binary choices on single-item, self-report questions produce scientific junk, absolutely. This kind of design / approach encourages not only magnitude errors, but also sign errors (you can't even trust the direction of the observed effect).

IMO, growing up, unless you lived under a rock, it seems obvious to me that you will have experienced different people pointing at the same colour and uttering very different colour labels (pink vs. red, blue vs. green, black vs. deep blue/purple, etc) from the labels you might have applied yourself. Differing/shared colour perception isn't exactly a rare kind of topic (almost is like the canonical stoner topic, also common online), so I'd be a bit surprised if this demo is actually introducing anyone to this concept already. Any excitement is surely from other implications people think the demo has.

But unfortunately there are no interesting implications from what this site shows. Yes, it demonstrates the boring fact that: "it isn't clear how different people assign different color labels to the same physical stimuli" (and yes, this is FALSELY assuming that everyone's monitors/screens are the same too), but if you didn't already know this... I'm not sure exactly what social context you could have possibly grown up in.

D-Machine · 2026-04-28T00:32:11 1777336331

Sapir-Worf and its ilk (if we don't have the language/concept, we can't perceive the difference/thing) are widely disproven and debunked, and don't even pass the smell test (learning new concepts and perceiving new things would be impossible). That kind of thinking is so tedious and decades out of date with modern cognitive science, neuroscience, psychology, etc.

D-Machine · 2026-04-28T00:07:37 1777334857

But that is wrong. This doesn't test colour perception or vision, it tests verbal classification of colour perception into a forced binary. Everyone could be perceiving the colour qualia 100% identically, but simply choosing different linguistic cutpoints, meaning you can't say this is about vision / perception at all (it may just be about language use).

kshacker · 2026-04-28T00:19:47 1777335587

I think the premise could be stated more clearly. It is a boolean choice. What do you think it is closer to.

Once I figured it, I tried it 2 more times ... and got different results :) but the new results were consistent.

D-Machine · 2026-04-28T00:22:35 1777335755

Agreed, there is no clear premise. Of course that different people looking at the same object will use different colour words is a triviality that anyone over, say, 10 years old knows. If that's the premise of the site, it is boring. People are getting excited because they think this implies something about differences in vision or perception... but it doesn't, that requires much more cleverness to test.

reactordev · 2026-04-28T09:02:49 1777366969

Or suspension of expectations…

D-Machine · 2026-04-28T00:05:29 1777334729

Asinine and meaningless. Forces a classification on something that obviously anyone with fully-functioning colour vision will classify as "aquamarine" or "turquoise" or etc.

This has nothing at all to do with colour perception, or, if actual differences in perception are involved, this test fails to distinguish those from individual differences in assignment to linguistic categories.

EDIT: To actually test something like this, you need to make an assumption that cannot easily be tested or supported by evidence.

E.g. say we could all agree that, generally, blue + orange is a more pleasant pairing than blue + green. One might then imagine a series of images using orange + varying interpolations between blue and green, with the prompt being "is this combination of colours more or less aesthetically pleasing than the last". The average cutpoint could then be interpreted as a subjective judgement of where e.g. teals become "more blue", from an aesthetic / complementary standpoint. But this test does nothing of the sort.

D-Machine · 2026-03-19T05:30:59 1773898259

This was sort of my reading as well: I took "clumping" to mean "bump-shaped".

D-Machine · 2026-03-19T04:46:54 1773895614

Right, I think (hope) the OP meant not to emphasize the "search" in the sentence, but the "reputable source". Of course a Google search now is much worse than an AI search.

And it is the ultimately the reputable source that matters, and whether the person actually read it and checked that the details matched the summary (be it human abstract, LLM-generated, or otherwise).

D-Machine · 2026-03-19T03:53:24 1773892404

Yup. And in general more heavy-tailed bumps are in fact better models (assuming normality tends to lead to over-confidence). Really think the universality is strictly mathematical, and actually rare in nature.