2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better
dblalock.substack.com
⭐ Merging of neural networks They present evidence that you’re better off training two copies of a model and then merging them than just training one copy for the same amount of time. Results only scale up to ResNet-18 on ImageNet for 150 epochs, and only like a 0.2% accuracy lift. Probably not worth the complexity if your alternative is a normal training workflow, but might be an interesting halfway point between fine-tuning only and training from scratch. Or even a means of parallelizing large-scale training.
2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better
2022-4-24: Merging networks, Wall of MoE…
2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better
⭐ Merging of neural networks They present evidence that you’re better off training two copies of a model and then merging them than just training one copy for the same amount of time. Results only scale up to ResNet-18 on ImageNet for 150 epochs, and only like a 0.2% accuracy lift. Probably not worth the complexity if your alternative is a normal training workflow, but might be an interesting halfway point between fine-tuning only and training from scratch. Or even a means of parallelizing large-scale training.