Sitemap - 2023 - Davis Summarizes Papers

2023-11-26 arXiv roundup: Big potential wins, 1 bit per parameter, Simplifying transformers

2023-11-19 arXiv roundup: Inverse-free inverse Hessians, Faster LLMs, Closed-form diffusion

2023-10-16 arXiv roundup: Cornucopia of easy (claimed) wins for LLMs

2023-9 arXiv roundup: A bunch of good ML systems and Empirical science papers

2023-8 arXiv roundup: Look I gave a talk, SILO-ing language models, lots of MoE + tool use papers

2023-7-30 arXiv roundup: Better image captions, Scaling EMA, Chain of thought empiricism

2023-7-23 arXiv roundup: OpenAI breaking changes, Much better attention and image captions

2023-7-16 arXiv roundup: Weird step sizes help gradient descent, Better CPU matmuls

2023-7-9 arXiv roundup: LLMs ignore the middle of their context, MoE + instruction tuning rocks

2023-7-2 arXiv roundup: Self-supervised eval, Prompting text models like image models, KV cache eviction

Models generating training data: huge win or fake win?

2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++

Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup

2023-6-11 arXiv: Training on GPT outputs works worse than you think, but training on explanations works great

2023-6-4 arXiv: 1063 papers, small models making their own data, Way simpler RLHF, Adam accumulation

2023-5-28 arXiv roundup: 994 papers, beating RLHF with 1000 good examples, LoRA with way less RAM, Multi-epoch scaling

2023-5-21 arXiv roundup: Parallel transformers, Optimized data mixtures, Don't trust LLM chains-of-thought

2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities

2023-5-7 arXiv roundup: Easy loss spike fix, LLongboi, H100s, stable diffusion for $50k

2023-4-23 arXiv roundup: Adam instability, better hypernetworks, More Branch-Train-Merge

2023-4-16 arXiv roundup: Segment Anything, Everything, and Everything everywhere all at once

2023-4-2 arXiv roundup: LLMs improving LLM output, BloombergGPT, LLM opinions differ from humans

2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks

2023-3-19 arXiv roundup: GPT-4, Data deduplication, MoE optimizations

2023-3-12 arxiv roundup: Pretraining BERT for $20, GigaGAN, Multimodal LLMs

2022-3-5 arXiv roundup: 5-bit training, What pretraining data to use, Expanding training sets via ML

2023-2-26 arXiv roundup: RLHF for diffusion, Multimodal chain of thought, Practical data poisoning

2023-2-19 arXiv roundup: ICML deluge part 2

2023-2-5 arXiv roundup: ICML deluge part 1

2023-1-29 arXiv roundup: Diffusion, Watermarking, and societal implications

2023-1-22 arXiv roundup: Domain-specific pretraining is awesome, Removing skip connections

2023-1-15 arXiv roundup: Way faster BERT, Mechanistic interpretability

2023-1-8 arXiv roundup: Language models creating their own data, Hinton vs backprop, Practical pruning + quantization for LLMs

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts