Sitemap - 2022 - Davis Summarizes Papers
2022-12-18 arXiv roundup: Robotics Transformer, Dense MoE pretraining
2022-12-4 arXiv roundup: New best MoE implementation, 3x faster transformer inference
Twitter's surprisingly easy path to profitability
2022-11-27 arXiv roundup: Multimodal retrieval, int8 and int4 LLM quantization
2022-11-13 arXiv roundup: Will we run out of data? Plus, how Google does large-scale inference
2022-11-6 arXiv roundup: ImageNet-X, Automating prompt engineering, BLOOM carbon footprint
2022-10-30 arXiv roundup: 1 million GPU hours, Scaling laws, Science-of-deep-learning papers
2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta
2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity
2022-10-9 arXiv roundup: The death of depth?, Vision pretraining throwdown, Parameter-efficient MoE
2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models?
2022-9-25 arXiv roundup: Metadata archaeology, Decaying pruning, 200x faster RL
2022-9-4 arXiv roundup: Deep nets without multiplies, Transformer world models, How best to pretrain
2022-8-28 arXiv roundup: Simulating humans with GPT-3, Elegant math paying off
2022-8-21 arXiv roundup: RecSys scaling, PatchDropout, LLM.int8()
2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers
2022-7-31 arXiv roundup: Transformer vs CNN showdown, 1000x smaller DLRM, ELECTRA improvements
2022-7-10 arXiv roundup: DeepSpeed inference, Simpler detection backbones, Spatial sparsification
2022-7-3 arXiv roundup: Minerva, Beating power laws, Surprising OOD linearity
2022-6-26 arXiv roundup: Way better certified robustness, Progressive SSL, Empirical NTKs
2022-6-19 arXiv roundup: RHO-LOSS, Pix2Seq v2, Fisher SAM
2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpeed & fp8 quantization
2022-6-5 arXiv roundup: SAM for free, FlashAttention, Supervised MAE
2022-5-28 arXiv roundup: OptFormer, Imagen, Thinking step by step, 23 other papers
2022-5-22 arXiv roundup: RankGen, Deep spectral clustering, Medical imaging pretraining
2022-5-15: T-Few, Task scaling, Gato
2022-5-8: OPT-175B, Better depth estimation, Mobile TPU NAS
2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances
2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better
2022-4-17: Neighborhood attention, 830k TPU-hours, Revenge of the ViT
2022-4-10: Solving ImageNet, An actually great pruning paper, Per-class augmentation
2022-4-3: Chinchilla, Bootstrapping rationales, HyperMorph
2022-3-27: Pathways, token dropping, decoupled mixup
2022-3-20: Memorizing transformers, {knn, non-trainable} softmax, reducing flipping errors
2022-3-13: InstructGPT, Model soups, muParameterization, LiteTransformerSearch
2022-3-6: N:M sparse attention, Rethinking demonstrations, Shift instead of attention
2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens
2022-2-20: Fantastic generalization measures, How vision transformers work, 0/1 Adam
2022-2-12: Generating training data, EfficientNet-X, Editing factual knowledge
2022-2-6: Highlights from all the ICML2022 submissions
2022-1-30: Xformers, ConvMixer, Megatron-Turing NLG 530B
2022-1-23: DeepSpeed-MoE, Guided layer freezing, Low-pass filtering SGD
2021-1-16: Grokking, Semantic segmentation with {BERT embeddings, only image-level labels}
2022-1-4: Two sparsities, Vision reservoir
2021-12-18: Data-free Knowledge Distillation, MagNets
2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer
2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer
2021-8-3 arXiv roundup: PiLU, Two long tails, Dataset distillation