2021-8-3 arXiv roundup: PiLU, Two long tails, Dataset distillation

May 03, 2022

Dataset Distillation with Infinitely Wide Convolutional Networks

Not sold on the infinitely wide part, but constructing an "optimized" training set once--which can be arbitrary inputs, rather than being constrained to equal a subset of the original data--is an interesting idea.

A Tale of Two Long Tails

Examples can just be hard/atypical, or they can be noisy. Thinking about this distinction can make stuff work better.

Piecewise Linear Units Improve Deep Neural Networks

"Across a distribution of 30 experiments, we show that for the same model architecture, hyperparameters, and pre-processing, PiLU significantly outperforms ReLU: reducing classification error by 18.53% on CIFAR-10 and 13.13% on CIFAR-100, for a minor increase in the number of neurons."

Gates are not what you need in RNNs

Just rip out the gating mechanisms and add residual connections and it seemingly works better across a bunch of experiments.

Davis Summarizes Papers

Discussion about this post