2021-8-3 arXiv roundup: PiLU, Two long tails, Dataset distillation
Dataset Distillation with Infinitely Wide Convolutional Networks
Not sold on the infinitely wide part, but constructing an "optimized" training set once--which can be arbitrary inputs, rather than being constrained to equal a subset of the original data--is an interesting idea.
A Tale of Two Long Tails
Examples can just be hard/atypical, or they can be noisy. Thinking about this distinction can make stuff work better.
Piecewise Linear Units Improve Deep Neural Networks
"Across a distribution of 30 experiments, we show that for the same model architecture, hyperparameters, and pre-processing, PiLU significantly outperforms ReLU: reducing classification error by 18.53% on CIFAR-10 and 13.13% on CIFAR-100, for a minor increase in the number of neurons."
Gates are not what you need in RNNs
Just rip out the gating mechanisms and add residual connections and it seemingly works better across a bunch of experiments.