This newsletter made possible by MosaicML.Don’t fall for false dichotomies. Share this newsletter! PatchDropout: Economizing Vision Transformers Using Patch Dropout When training DeiT or Swin transformers, you can just drop half the input patches at random and it’s totally fine.
2022-8-21 arXiv roundup: RecSys scaling, PatchDropout, LLM.int8()
2022-8-21 arXiv roundup: RecSys scaling…
2022-8-21 arXiv roundup: RecSys scaling, PatchDropout, LLM.int8()
This newsletter made possible by MosaicML.Don’t fall for false dichotomies. Share this newsletter! PatchDropout: Economizing Vision Transformers Using Patch Dropout When training DeiT or Swin transformers, you can just drop half the input patches at random and it’s totally fine.