⭐ Bag of Tricks for Optimizing Transformer Efficiency Really nice paper full of practical improvements. Frustratingly Simple Pretraining Alternatives to Masked Language Modeling They mostly just tie MLM, but they have a couple plots in the appendix where predicting the first letter of masked words does way better. Needs more detailed reading.
2021-9-12 arXiv roundup
2021-9-12 arXiv roundup
2021-9-12 arXiv roundup
⭐ Bag of Tricks for Optimizing Transformer Efficiency Really nice paper full of practical improvements. Frustratingly Simple Pretraining Alternatives to Masked Language Modeling They mostly just tie MLM, but they have a couple plots in the appendix where predicting the first letter of masked words does way better. Needs more detailed reading.