Sure, of course. Wasn't suggesting "are you beating a sota benchmark"? I'm floating the idea of an ablation that matches a realistic scenario for the dataset / task. Personally curious how manifold muon performs compared to AdamW in a throughly explored context. This is the first time I've seen a 3-layer mlp on cifar-10.
I probably should have made the 9-layer ResNet part more, front-and-center / central to my point.
Here's the top model on DAWNBench - https://github.com/apple/ml-cifar-10-faster/blob/main/fast_c...
Trains for 15 epochs and it, like all the others is a 9 layer resnet.