Davis Summarizes Papers

Share this post
2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer
dblalock.substack.com

2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer

Davis Blalock
May 3
Share this post
2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer
dblalock.substack.com

Adaptive Optimization with Examplewise Gradients. They try to exploit per-sample gradient information (rather than the normal per-batch grads averaged across samples) to improve adam. Seems to be a negative result so far.

How Well Do Sparse Imagenet Models Transfer? "In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups."

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective. Some math + CIFAR evidence that larger batches converge to sharper minima (though not if you train infinitely long---they eventually catch up)

How Smart Guessing Strategies Can Yield Massive Scalability Improvements for Sparse Decision Tree Optimization. They train sparse (ie, smaller, more interpretable) decision trees with something resembling distillation + some intuitive heuristics. Tree-based models are less popular to study than deep learning, but extremely important in practice, so this could be pretty valuable.

⭐ Sparse is Enough in Scaling Transformers. Google paper getting no training-time improvement, but up to 37x single-sequence CPU inference speedup at iso accuracy. Not sure how it works because wasn't easy to skim. Probably the most interesting/meaty paper this week.

Share this post
2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer
dblalock.substack.com
Comments

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

TopNew

No posts

Ready for more?

© 2022 Davis Blalock
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing