Do you really train more than one model at the same time on a single GPU? In my ...

volta87 · on Jan 15, 2021

Depends on model size, but if the model is small enough that I actually do training on a PCIe board, I do. I partition an A100 in 8, and train 8 models at a time, or just use MPS on a V100 board. The bigger A100 boards can fit multiple of the same models that do fit in a single V100..

Also I tend to do this initially, when I am exploring the hyperparameter space, for which I tend to use smaller but more models.

I find that using big models initially is just a waste of time. You want to try many things as quickly as possible.

junipertea · on Jan 14, 2021

I found training multiple models on same GPU hit other bottlenecks (mainly memory capacity/bandwidth) fast. I tend to train one model per GPU and just scale the number of computers. Also, if nothing else, we tend to push the models to fit the GPU memory.

volta87 · on Jan 15, 2021

Memory became less of an issue for me with V100, and isn't really an issue with A100, at least when quickly iterating for newer models, when the sizes are still relatively small.