2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
dblalock.substack.com
This newsletter made possible by MosaicML. Unit Scaling: Out-of-the-Box Low-Precision Training They make all the weights, activations, and gradients have unit variance through simple math. Unlike initialization schemes that try to get one of these (usually activations) to have unit variance, they get all three by:
2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
2023-3-26 arXiv roundup: Unit scaling…
2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
This newsletter made possible by MosaicML. Unit Scaling: Out-of-the-Box Low-Precision Training They make all the weights, activations, and gradients have unit variance through simple math. Unlike initialization schemes that try to get one of these (usually activations) to have unit variance, they get all three by: