Davis Summarizes Papers
Subscribe
Sign in
Home
AI Implications
Archive
About
New
Top
Discussion
2023-5-28 arXiv roundup: 994 papers, beating RLHF with 1000 good examples, LoRA with half the RAM, Multi-epoch scaling
This newsletter made possible by MosaicML. We got a bunch of interesting stuff this week, thanks in part to getting the most submissions of all time…
May 30
•
Davis Blalock
11
2
Share this post
2023-5-28 arXiv roundup: 994 papers, beating RLHF with 1000 good examples, LoRA with half the RAM, Multi-epoch scaling
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-5-21 arXiv roundup: Parallel transformers, Optimized data mixtures, Don't trust LLM chains-of-thought
This newsletter made possible by MosaicML. Accelerating Transformer Inference for Translation via Parallel Decoding They speed up inference in…
May 25
•
Davis Blalock
8
Share this post
2023-5-21 arXiv roundup: Parallel transformers, Optimized data mixtures, Don't trust LLM chains-of-thought
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
This newsletter made possible by MosaicML. An Inverse Scaling Law for CLIP Training They find that larger CLIP models let you get away with training on…
May 16
•
Davis Blalock
9
Share this post
2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-5-7 arXiv roundup: Easy loss spike fix, LLongboi, H100s, stable diffusion for $50k
This newsletter made possible by MosaicML. Thanks to Haoli Yin and Trevor Gale for the Twitter shoutouts this week! Introducing MPT-7B: A New Standard…
May 10
•
Davis Blalock
14
3
Share this post
2023-5-7 arXiv roundup: Easy loss spike fix, LLongboi, H100s, stable diffusion for $50k
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
April 2023
2023-4-23 arXiv roundup: Adam instability, better hypernetworks, More Branch-Train-Merge
This newsletter made possible by MosaicML. Also: is anyone looking for a podcast guest? I may or may not have a podcast in the works and want to get…
Apr 27
•
Davis Blalock
10
Share this post
2023-4-23 arXiv roundup: Adam instability, better hypernetworks, More Branch-Train-Merge
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-4-16 arXiv roundup: Segment Anything, Everything, and Everything everywhere all at once
Yes, in a two-week period we actually got papers named: Segment Anything SegGPT: Segmenting Everything In Context Segment Everything Everywhere All at…
Apr 20
•
Davis Blalock
17
1
Share this post
2023-4-16 arXiv roundup: Segment Anything, Everything, and Everything everywhere all at once
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-4-2 arXiv roundup: LLMs improving LLM output, BloombergGPT, LLM opinions differ from humans
This newsletter made possible by MosaicML. Thanks to Charlie Blake for the Twitter shoutout this week! Training Language Models with Language Feedback…
Apr 6
•
Davis Blalock
14
1
Share this post
2023-4-2 arXiv roundup: LLMs improving LLM output, BloombergGPT, LLM opinions differ from humans
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
March 2023
2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
This newsletter made possible by MosaicML. Unit Scaling: Out-of-the-Box Low-Precision Training They make all the weights, activations, and gradients…
Mar 29
•
Davis Blalock
8
Share this post
2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-3-19 arXiv roundup: GPT-4, Data deduplication, MoE optimizations
This newsletter made possible by MosaicML. GPT-4 Technical Report This is a 98-page document, so we’re just gonna go through some highlights. First…
Mar 22
•
Davis Blalock
13
2
Share this post
2023-3-19 arXiv roundup: GPT-4, Data deduplication, MoE optimizations
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-3-12 arxiv roundup: Pretraining BERT for $20, GigaGAN, Multimodal LLMs
This newsletter made possible by MosaicML. Pretraining BERT from Scratch for $20 We trained an optimized BERT model to match the results from the…
Mar 15
•
Davis Blalock
6
2
Share this post
2023-3-12 arxiv roundup: Pretraining BERT for $20, GigaGAN, Multimodal LLMs
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2022-3-5 arXiv roundup: 5-bit training, What pretraining data to use, Expanding training sets via ML
This newsletter made possible by MosaicML (look at our shiny new website). Full Stack Optimization of Transformer Inference: a Survey Besides having a…
Mar 9
•
Davis Blalock
8
Share this post
2022-3-5 arXiv roundup: 5-bit training, What pretraining data to use, Expanding training sets via ML
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
2023-2-26 arXiv roundup: RLHF for diffusion, Multimodal chain of thought, Practical data poisoning
This newsletter made possible by MosaicML. Poisoning Web-Scale Training Datasets is Practical You can poison public datasets by buying domains …
Mar 1
•
Davis Blalock
8
1
Share this post
2023-2-26 arXiv roundup: RLHF for diffusion, Multimodal chain of thought, Practical data poisoning
dblalock.substack.com
Copy link
Twitter
Facebook
Email
Notes
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts