2023-3-19 arXiv roundup: GPT-4, Data deduplication, MoE optimizations
dblalock.substack.com
This newsletter made possible by MosaicML. GPT-4 Technical Report This is a 98-page document, so we’re just gonna go through some highlights. First, scaling is still going strong. We haven’t saturated the log-log-linear trend yet. This holds not just for the pretraining objective, but also for various downstream tasks.
"Recursive self-improvement is picking up steam." This is not recursive self-improvement. This is instead trying to make an LLM self-consistent (when applied to generate feedback for itself) or to make two LLMs mutually consistent (when applied to generate feedback for each other). The effect is similar to message passing in probabilistic reasoning: we are trying to get the various parts of the network to agree with each other about the generated outputs. This will not lead to a "takeoff".
"Recursive self-improvement is picking up steam." This is not recursive self-improvement. This is instead trying to make an LLM self-consistent (when applied to generate feedback for itself) or to make two LLMs mutually consistent (when applied to generate feedback for each other). The effect is similar to message passing in probabilistic reasoning: we are trying to get the various parts of the network to agree with each other about the generated outputs. This will not lead to a "takeoff".