Sitemap - 2022 - Davis Summarizes Papers

2022-12-18 arXiv roundup: Robotics Transformer, Dense MoE pretraining

2022-12-11 arXiv roundup

2022-12-4 arXiv roundup: New best MoE implementation, 3x faster transformer inference

Twitter's surprisingly easy path to profitability

2022-11-27 arXiv roundup: Multimodal retrieval, int8 and int4 LLM quantization

2022-11-20 arXiv roundup: The many pitfalls of FLOPs, RL in production, A learned optimizer that "just works"

2022-11-13 arXiv roundup: Will we run out of data? Plus, how Google does large-scale inference

2022-11-6 arXiv roundup: ImageNet-X, Automating prompt engineering, BLOOM carbon footprint

2022-10-30 arXiv roundup: 1 million GPU hours, Scaling laws, Science-of-deep-learning papers

2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta

2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity

2022-10-9 arXiv roundup: The death of depth?, Vision pretraining throwdown, Parameter-efficient MoE

2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models?

2022-9-25 arXiv roundup: Metadata archaeology, Decaying pruning, 200x faster RL

2022-9-18 arXiv roundup: Reliable fp8 training, Better scaling laws, Different minima are often just different weight permutations

2022-9-11 arXiv roundup: Beating GPT-3 with 20B params, Emergence in language models, Worse deepfakes on the horizon

2022-9-4 arXiv roundup: Deep nets without multiplies, Transformer world models, How best to pretrain

2022-8-28 arXiv roundup: Simulating humans with GPT-3, Elegant math paying off

2022-8-21 arXiv roundup: RecSys scaling, PatchDropout, LLM.int8()

2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers

2022-8-7 arXiv roundup: Adam and sharpness, Recursive self-improvement for coding, Training and model tweaks

2022-7-31 arXiv roundup: Transformer vs CNN showdown, 1000x smaller DLRM, ELECTRA improvements

2022-7-24 arXiv roundup: Int8 training at almost no accuracy loss, DataPerf, Scaling & inductive biases

2022-7-17 arxiv roundup: Next-ViT, Anthropic & DeepMind & Google interrogate giant language models, 16 other papers

2022-7-10 arXiv roundup: DeepSpeed inference, Simpler detection backbones, Spatial sparsification

2022-7-3 arXiv roundup: Minerva, Beating power laws, Surprising OOD linearity

2022-6-26 arXiv roundup: Way better certified robustness, Progressive SSL, Empirical NTKs

2022-6-19 arXiv roundup: RHO-LOSS, Pix2Seq v2, Fisher SAM

2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpeed & fp8 quantization

2022-6-5 arXiv roundup: SAM for free, FlashAttention, Supervised MAE

2022-5-28 arXiv roundup: OptFormer, Imagen, Thinking step by step, 23 other papers

2022-5-22 arXiv roundup: RankGen, Deep spectral clustering, Medical imaging pretraining

2022-5-15: T-Few, Task scaling, Gato

2022-5-8: OPT-175B, Better depth estimation, Mobile TPU NAS

2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances

2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better

2022-4-17: Neighborhood attention, 830k TPU-hours, Revenge of the ViT

2022-4-10: Solving ImageNet, An actually great pruning paper, Per-class augmentation

2022-4-3: Chinchilla, Bootstrapping rationales, HyperMorph

2022-3-27: Pathways, token dropping, decoupled mixup

2022-3-20: Memorizing transformers, {knn, non-trainable} softmax, reducing flipping errors

2022-3-13: InstructGPT, Model soups, muParameterization, LiteTransformerSearch

2022-3-6: N:M sparse attention, Rethinking demonstrations, Shift instead of attention

2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens

2022-2-20: Fantastic generalization measures, How vision transformers work, 0/1 Adam

2022-2-12: Generating training data, EfficientNet-X, Editing factual knowledge

2022-2-6: Highlights from all the ICML2022 submissions

2022-1-30: Xformers, ConvMixer, Megatron-Turing NLG 530B

2022-1-23: DeepSpeed-MoE, Guided layer freezing, Low-pass filtering SGD

2021-1-16: Grokking, Semantic segmentation with {BERT embeddings, only image-level labels}

2022-1-4: Two sparsities, Vision reservoir

2021-12-18: Data-free Knowledge Distillation, MagNets

2021-12-12: PixMix, RETRO

2021-12-4: Sparsity is Enough in Scaling Transformers, Sparse ImageNet transfer

2021-9-19 arXiv roundup - OMPQ, Don't pretrain?, EfficientBERT, Primer

2021-9-12 arXiv roundup

2021-8-29 arXiv roundup

2021-8-8 arXiv roundup

2021-8-3 arXiv roundup: PiLU, Two long tails, Dataset distillation

Hello World