This newsletter made possible by MosaicML. Thanks to Haoli Yin and Trevor Gale for the Twitter shoutouts this week! Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs We released a family of 7 billion parameter models that work at least as well as other 7B models out there and often better.
How does the merge operation from the branch-train-merge differ from that of the Zip-it paper? It seems that combining the ideas of branch and train for domain experts along with the zipping operation removes the explicit routing needed?
this is great content!
How does the merge operation from the branch-train-merge differ from that of the Zip-it paper? It seems that combining the ideas of branch and train for domain experts along with the zipping operation removes the explicit routing needed?