2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
dblalock.substack.com
This newsletter made possible by MosaicML. An Inverse Scaling Law for CLIP Training They find that larger CLIP models let you get away with training on fewer tokens per input, in the sense that they experience less accuracy degradation. By “using fewer tokens”, we mean algorithmically removing some of the tokens in the input image and text. To remove image tokens, you can mask or resize the image in various ways. They use random masking in most cases.
2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
2023-5-14 arXiv roundup: FrugalGPT, Inverse…
2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
This newsletter made possible by MosaicML. An Inverse Scaling Law for CLIP Training They find that larger CLIP models let you get away with training on fewer tokens per input, in the sense that they experience less accuracy degradation. By “using fewer tokens”, we mean algorithmically removing some of the tokens in the input image and text. To remove image tokens, you can mask or resize the image in various ways. They use random masking in most cases.