2023-8 arXiv roundup: Look I gave a talk…

Sep 1, 2023

(I’m still planning on doing weekly installments in general—I just got behind and it took a while to catch up).

2 Comments

Oct 26, 2023

The results for pre-gated MoE probably only *look* good, because of a slow implementation of the model to begin with. Their baseline speed for switch-base on a PCIe A100-80G is 120~150 generated tokens/s on batch size 1, which is a little bit pathetic for ~200M activated params these days -- it's comparable to the tg128 speed of 4-bit quantized llama-7b.

Because the PCIe latency bottleneck still remains regardless of how much more efficiently the flops are used, the proportion of latency consumed by loading experts should become far worse.

Maybe it will work better with Grace Hopper's bridged memory.

Expand full comment

Quentin

Sep 13, 2023

Do you want to collab? Seems like we have synergistic audiences

Expand full comment

Davis Summarizes Papers

2023-8 arXiv roundup: Look I gave a talk…