2 Comments

Congrats on the new addition to the family, and also a more dynamic training round! Insights on llama3.1 are very helpful as usual, I am mostly impressed by the simple architecture to max training stability and the heavy use of synthetic data in post training.

Expand full comment

Just started training my second 100T model and haven't had time to read the Llama 3.1 paper. Thanks for putting in the work 🙂

Expand full comment