2023-2-26 arXiv roundup: RLHF for diffusion, Multimodal chain of thought, Practical data poisoning
dblalock.substack.com
This newsletter made possible by MosaicML. Poisoning Web-Scale Training Datasets is Practical You can poison public datasets by buying domains / changing the content after the URL gets included in a dataset, or by transiently editing sites like Wikipedia right before they get scraped.
Man people always forgetting ELMo (in the figure from the pretrained models history paper). It even started the muppet trend!