Davis Summarizes Papers

Share this post
2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer
dblalock.substack.com

2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer

Davis Blalock
May 3
Share this post
2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer
dblalock.substack.com
  • OMPQ: Orthogonal Mixed Precision Quantization - they figure out how many bits to use for different layers in <9 seconds by using a proxy objective. In the camp of "read this if and only if you care about about quantization."

  • Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative - They beat fine-tuning on some tasks using multi-task learning + meta-learning. Kind of makes sense intuitively to train directly on what you care about to at least some extent, but the results didn't seem too conclusive, and their method is more complicated.

  • EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation - doomed to be overshadowed by:

  • Primer: Searching for Efficient Transformers for Language Modeling - which is basically just saying to use Relu^2 instead of softmax as the attention matrix nonlinearity.

Share this post
2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer
dblalock.substack.com
Comments

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

TopNew

No posts

Ready for more?

© 2022 Davis Blalock
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing