Sitemap - 2023 - Davis Summarizes Papers
2023-11-26 arXiv roundup: Big potential wins, 1 bit per parameter, Simplifying transformers
2023-11-19 arXiv roundup: Inverse-free inverse Hessians, Faster LLMs, Closed-form diffusion
2023-10-16 arXiv roundup: Cornucopia of easy (claimed) wins for LLMs
2023-9 arXiv roundup: A bunch of good ML systems and Empirical science papers
2023-8 arXiv roundup: Look I gave a talk, SILO-ing language models, lots of MoE + tool use papers
2023-7-30 arXiv roundup: Better image captions, Scaling EMA, Chain of thought empiricism
2023-7-23 arXiv roundup: OpenAI breaking changes, Much better attention and image captions
2023-7-16 arXiv roundup: Weird step sizes help gradient descent, Better CPU matmuls
2023-7-9 arXiv roundup: LLMs ignore the middle of their context, MoE + instruction tuning rocks
Models generating training data: huge win or fake win?
2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup
2023-6-4 arXiv: 1063 papers, small models making their own data, Way simpler RLHF, Adam accumulation
2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities
2023-5-7 arXiv roundup: Easy loss spike fix, LLongboi, H100s, stable diffusion for $50k
2023-4-23 arXiv roundup: Adam instability, better hypernetworks, More Branch-Train-Merge
2023-4-16 arXiv roundup: Segment Anything, Everything, and Everything everywhere all at once
2023-4-2 arXiv roundup: LLMs improving LLM output, BloombergGPT, LLM opinions differ from humans
2023-3-26 arXiv roundup: Unit scaling, Origins of power laws, Removing text watermarks
2023-3-19 arXiv roundup: GPT-4, Data deduplication, MoE optimizations
2023-3-12 arxiv roundup: Pretraining BERT for $20, GigaGAN, Multimodal LLMs
2022-3-5 arXiv roundup: 5-bit training, What pretraining data to use, Expanding training sets via ML
2023-2-26 arXiv roundup: RLHF for diffusion, Multimodal chain of thought, Practical data poisoning
2023-2-19 arXiv roundup: ICML deluge part 2
2023-2-5 arXiv roundup: ICML deluge part 1
2023-1-29 arXiv roundup: Diffusion, Watermarking, and societal implications
2023-1-22 arXiv roundup: Domain-specific pretraining is awesome, Removing skip connections
2023-1-15 arXiv roundup: Way faster BERT, Mechanistic interpretability