2021-9-12 arXiv roundup
⭐ Bag of Tricks for Optimizing Transformer Efficiency
Really nice paper full of practical improvements.
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
They mostly just tie MLM, but they have a couple plots in the appendix where predicting the first letter of masked words does way better. Needs more detailed reading.
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
To go from keywords -> caption better, they go keywords -> google images results -> captions from pretrained caption network -> final caption. Another interesting datapoint in the theme of leaning on external pretrained models to get better results. In this case, the “model” is google image search.
C-MinHash: Rigorously Reducing K Permutations to Two
I don't think most deep learning people care about minhash, but a lot of people do; plus this is an example of a pretty strong algorithms paper for anyone curious what those look like