2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
dblalock.substack.com
This newsletter brought to you by MosaicML. Textbooks Are All You Need They got near-SotA code generation with a tiny 1.3B param model by curating an awesome training corpus. This turns out to be possible because most code datasets are terrible. More precisely, they’re not self-contained; they lack meaningful comments, they’re mostly configuration or boilerplate; and they lack broad coverage.
2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
2023-6-25 arXiv roundup: Learning from…
2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
This newsletter brought to you by MosaicML. Textbooks Are All You Need They got near-SotA code generation with a tiny 1.3B param model by curating an awesome training corpus. This turns out to be possible because most code datasets are terrible. More precisely, they’re not self-contained; they lack meaningful comments, they’re mostly configuration or boilerplate; and they lack broad coverage.