2021-8-3 arXiv roundup
Dataset Distillation with Infinitely Wide Convolutional Networks. Not sold on the infinitely wide part, but constructing an "optimized" training set once--which can be arbitrary inputs, rather than being constrained to equal a subset of the original data--is an interesting idea.
A Tale of Two Long Tails. Examples can just be hard/atypical, or they can be noisy. Thinking about this distinction can make stuff work better.
Piecewise Linear Units Improve Deep Neural Networks. "Across a distribution of 30 experiments, we show that for the same model architecture, hyperparameters, and pre-processing, PiLU significantly outperforms ReLU: reducing classification error by 18.53% on CIFAR-10 and 13.13% on CIFAR-100, for a minor increase in the number of neurons."
Gates are not what you need in RNNs. Just rip out the gating mechanisms and add residual connections and it seemingly works better across a bunch of experiments.