This newsletter made possible by MosaicML. What Language Model to Train if You Have One Million GPU Hours? They ran a variety of experiments on 1B to 13B parameter models to help them figure out how best to train their final huge model. We’ll go through the experiments one-by-one.
2022-10-30 arXiv roundup: 1 million GPU hours, Scaling laws, Science-of-deep-learning papers
2022-10-30 arXiv roundup: 1 million GPU…
2022-10-30 arXiv roundup: 1 million GPU hours, Scaling laws, Science-of-deep-learning papers
This newsletter made possible by MosaicML. What Language Model to Train if You Have One Million GPU Hours? They ran a variety of experiments on 1B to 13B parameter models to help them figure out how best to train their final huge model. We’ll go through the experiments one-by-one.