2022-2-20: Fantastic generalization measures, How vision transformers work, 0/1 Adam
dblalock.substack.com
Fantastic Generalization Measures and Where to Find Them. Section 4 summarizes results, which seem to all be on CIFAR10 or SVHN. Confirms that overparameterization helps generalization, that norm-based and classical VC-style measures correlate negatively with generalization, and that flatness (and proxies like final gradient variance) also correlate with generalization. Also, faster initial loss reduction correlated with worse generalization, supporting the explore/exploit model of optimization we’ve seen with cyclic learning rates.
2022-2-20: Fantastic generalization measures, How vision transformers work, 0/1 Adam
2022-2-20: Fantastic generalization measures…
2022-2-20: Fantastic generalization measures, How vision transformers work, 0/1 Adam
Fantastic Generalization Measures and Where to Find Them. Section 4 summarizes results, which seem to all be on CIFAR10 or SVHN. Confirms that overparameterization helps generalization, that norm-based and classical VC-style measures correlate negatively with generalization, and that flatness (and proxies like final gradient variance) also correlate with generalization. Also, faster initial loss reduction correlated with worse generalization, supporting the explore/exploit model of optimization we’ve seen with cyclic learning rates.