2021-12-12: PixMix, RETRO
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures. Points out that there are many safety metrics, and beats existing methods across all of them via a new data augmentation pipeline. Namely, they combine images with crazy LSD-looking images like fractals.
Didn't read it, but looks like Baidu made an (internal?) deepspeed-like tool
Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation. I couldn't make sense of what they're doing, but baking in 3d rotation {in,equi}variance could be a really valuable inductive bias in vision models.
Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and Treating CNN Classifiers. This paper really needs some editing passes from a native english speaker, but they claim 1-5% accuracy lifts through some sort of post-training model pruning/constraining. Results are across a good set of CNNs, but only up to mini-Imagenet (?) and smaller datasets. I stared at the text for quite a while and couldn't make sense of it, but this might contain an easy win.
⭐ Improving language models by retrieving from trillions of tokens. "With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering." and they "can also rapidly RETROfit pre-trained transformers with retrieval"
Implicit Neural Representations for Image Compression Train a hypernetwork whose output is a network mapping (x, y) positions to pixel values. The image encoding is this outputted neural network. Combined with other recent work using hypernetworks to yield way better U-nets, this increases my estimate of how promising hypernetwork-based approaches are. EDIT: not actually a separate hypernetwork; just a learned initialization for the network, and what they use as the compressed representation is the set of weight updates.
Also, TensorFlow now has a faster tokenizer. Basically seems to be a data structure contribution, precomputing a bunch of stuff in the trie to trade increased space for decreased asymptotic complexity