In case you’re wondering what I’ve been up to instead of posting for the past couple months, I was kicking off a training run for a 100T parameter biological neural network:
Congrats on the new addition to the family, and also a more dynamic training round! Insights on llama3.1 are very helpful as usual, I am mostly impressed by the simple architecture to max training stability and the heavy use of synthetic data in post training.
Congrats on the new addition to the family, and also a more dynamic training round! Insights on llama3.1 are very helpful as usual, I am mostly impressed by the simple architecture to max training stability and the heavy use of synthetic data in post training.
Just started training my second 100T model and haven't had time to read the Llama 3.1 paper. Thanks for putting in the work 🙂