2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer

May 03, 2022

OMPQ: Orthogonal Mixed Precision Quantization

They figure out how many bits to use for different layers in <9 seconds by using a proxy objective. In the camp of "read this if and only if you care about about quantization."

Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative

They beat fine-tuning on some tasks using multi-task learning + meta-learning. Kind of makes sense intuitively to train directly on what you care about to at least some extent, but the results didn't seem too conclusive, and their method is more complicated.

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Doomed to be overshadowed by:

Primer: Searching for Efficient Transformers for Language Modeling

Basically just saying to use ReLU^2 instead of softmax as the attention matrix nonlinearity.

Davis Summarizes Papers

Discussion about this post