Convolutional Xformers for Vision Not sure what to make of the overall efficacy of the approach, but they report lifts from 1) switching optimizers from AdamW to SGD during training, and 2) turning off randaugment near the end of training, both of which seem like actionable (if somewhat mysterious) optimizations.
2022-1-30: Xformers, ConvMixer…
Convolutional Xformers for Vision Not sure what to make of the overall efficacy of the approach, but they report lifts from 1) switching optimizers from AdamW to SGD during training, and 2) turning off randaugment near the end of training, both of which seem like actionable (if somewhat mysterious) optimizations.