Davis Summarizes Papers
Subscribe
Sign in
Home
AI Analysis
Archive
About
New
Top
Discussion
2023-11-26 arXiv roundup: Big potential wins, 1 bit per parameter, Simplifying transformers
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication They built a crazy fast hardware…
Dec 1, 2023
•
Davis Blalock
24
Share this post
2023-11-26 arXiv roundup: Big potential wins, 1 bit per parameter, Simplifying transformers
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
November 2023
2023-11-19 arXiv roundup: Inverse-free inverse Hessians, Faster LLMs, Closed-form diffusion
A fundamental result in queueing theory is that, if items enter the queue faster than they’re processed, the length of the queue tends to infinity. Just…
Nov 19, 2023
•
Davis Blalock
24
Share this post
2023-11-19 arXiv roundup: Inverse-free inverse Hessians, Faster LLMs, Closed-form diffusion
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
October 2023
2023-10-16 arXiv roundup: Cornucopia of easy (claimed) wins for LLMs
Also, I was on the AI Stories podcast! In case anyone assumed I was incredibly handsome, this is the perfect chance to disillusion yourself. This was a…
Oct 17, 2023
•
Davis Blalock
19
Share this post
2023-10-16 arXiv roundup: Cornucopia of easy (claimed) wins for LLMs
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
4
2023-9 arXiv roundup: A bunch of good ML systems and Empirical science papers
Got behind the curve again and ended up taking me more than a week to catch up. Y’all need to not write so many papers… Are Emergent Abilities in Large…
Oct 6, 2023
•
Davis Blalock
27
Share this post
2023-9 arXiv roundup: A bunch of good ML systems and Empirical science papers
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
September 2023
2023-8 arXiv roundup: Look I gave a talk, SILO-ing language models, lots of MoE + tool use papers
(I’m still planning on doing weekly installments in general—I just got behind and it took a while to catch up). This newsletter made possible by…
Sep 1, 2023
•
Davis Blalock
20
Share this post
2023-8 arXiv roundup: Look I gave a talk, SILO-ing language models, lots of MoE + tool use papers
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
2
August 2023
2023-7-30 arXiv roundup: Better image captions, Scaling EMA, Chain of thought empiricism
Heads up that we’ve hit the late summer slump in arXiv submissions, so there’s less content than usual this week. P.S.: thanks to AI Supremacy for the…
Aug 1, 2023
•
Davis Blalock
16
Share this post
2023-7-30 arXiv roundup: Better image captions, Scaling EMA, Chain of thought empiricism
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
1
July 2023
2023-7-23 arXiv roundup: OpenAI breaking changes, Much better attention and image captions
This newsletter made possible by MosaicML. Retentive Network: A Successor to Transformer for Large Language Models They introduce an exceptionally…
Jul 25, 2023
•
Davis Blalock
18
Share this post
2023-7-23 arXiv roundup: OpenAI breaking changes, Much better attention and image captions
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
7
2023-7-16 arXiv roundup: Weird step sizes help gradient descent, Better CPU matmuls
This newsletter made possible by MosaicML. Also a reminder that I’m now retweeting high-quality threads about papers to try to improve the…
Jul 20, 2023
•
Davis Blalock
22
Share this post
2023-7-16 arXiv roundup: Weird step sizes help gradient descent, Better CPU matmuls
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
2023-7-9 arXiv roundup: LLMs ignore the middle of their context, MoE + instruction tuning rocks
This newsletter made possible by MosaicML. A mini-announcement Because it’s gotten increasingly difficult to find technical ML content in all the AI…
Jul 11, 2023
•
Davis Blalock
20
Share this post
2023-7-9 arXiv roundup: LLMs ignore the middle of their context, MoE + instruction tuning rocks
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
2023-7-2 arXiv roundup: Self-supervised eval, Prompting text models like image models, KV cache eviction
This newsletter made possible by MosaicML. And thanks to @snowclipsed for the Twitter shoutout this week! Also, I wrote a blog post about using language…
Jul 5, 2023
•
Davis Blalock
19
Share this post
2023-7-2 arXiv roundup: Self-supervised eval, Prompting text models like image models, KV cache eviction
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
2
Models generating training data: huge win or fake win?
Here’s a puzzle: We’ve seen a lot of papers claiming you can use one language model to generate useful training data for another language model. But…by…
Jul 2, 2023
•
Davis Blalock
19
Share this post
Models generating training data: huge win or fake win?
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
7
June 2023
2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
This newsletter brought to you by MosaicML. Textbooks Are All You Need They got near-SotA code generation with a tiny 1.3B param model by curating an…
Jun 29, 2023
•
Davis Blalock
14
Share this post
2023-6-25 arXiv roundup: Learning from textbooks, Eliminating transformer outliers, Zero++
dblalock.substack.com
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts