2022-5-8: OPT-175B, Better depth estimation, Mobile TPU NAS
dblalock.substack.com
These paper summaries made possible by MosaicML. If you like these papers, you might like our open-source library for faster model training or even joining our team. Convolutional and Residual Networks Provably Contain Lottery Tickets For a CNN with skip connections and reasonable initializations + nonlinearities, there exists a wider and slightly deeper sparse CNN that approximates its outputs with high probability. Previous work couldn’t handle skip connections or convolutions. They actually instantiate their construction for some MNIST networks and show they can match the target network’s accuracy without training—instead just (approximately) solving a large number of subset sum problems to directly identify a sparse subnetwork. Doesn’t seem like a technique one would want to use in practice yet, but always nice to see a theoretical result that 1) applies to somewhat realistic networks, and 2) works without relying on quantities approaching infinity.
2022-5-8: OPT-175B, Better depth estimation, Mobile TPU NAS
2022-5-8: OPT-175B, Better depth estimation…
2022-5-8: OPT-175B, Better depth estimation, Mobile TPU NAS
These paper summaries made possible by MosaicML. If you like these papers, you might like our open-source library for faster model training or even joining our team. Convolutional and Residual Networks Provably Contain Lottery Tickets For a CNN with skip connections and reasonable initializations + nonlinearities, there exists a wider and slightly deeper sparse CNN that approximates its outputs with high probability. Previous work couldn’t handle skip connections or convolutions. They actually instantiate their construction for some MNIST networks and show they can match the target network’s accuracy without training—instead just (approximately) solving a large number of subset sum problems to directly identify a sparse subnetwork. Doesn’t seem like a technique one would want to use in practice yet, but always nice to see a theoretical result that 1) applies to somewhat realistic networks, and 2) works without relying on quantities approaching infinity.