2023-5-7 arXiv roundup: Easy loss spike fix…

Davis Blalock

May 10, 2023

This newsletter made possible by MosaicML.

Read →

3 Comments

May 10, 2023

this is great content!

Expand full comment

Haoli Yin

May 10, 2023

How does the merge operation from the branch-train-merge differ from that of the Zip-it paper? It seems that combining the ideas of branch and train for domain experts along with the zipping operation removes the explicit routing needed?

Expand full comment

Reply (1)

Davis Blalock

May 16, 2023

IIRC branch-train-merge relies on the all the branches starting from the same model so that they're all in roughly the same loss basin and we can just elementwise average the params. So yes, with zipping, we might be able to remove the need for a shared initial model (though probably not a shared architecture).

Expand full comment

Davis Summarizes Papers

2023-5-7 arXiv roundup: Easy loss spike fix…