This newsletter made possible by MosaicML. Retentive Network: A Successor to Transformer for Large Language Models They introduce an exceptionally promising attention variant. Basically, they: Ditch the softmax Let each token attend only to a vector of state, instead of all previous tokens
They seem severely undertrained compared to other networks like Llama 2. Wondering if they just converge a little faster in the beginning of training and hence the favorable perf compared to regular Transformers.
There might be a few corrections to make on the "How is ChatGPT's behavior changing over time?" summary. Do not take this in bad faith, it's just that I read this newsletter and like it to stay true to the facts.
> In many cases, GPT-4 got worse while GPT-3.5 got better.
You might not be aware that claims of performance decreases seems to be misplaced, at least on the experiments investigated in the paper.
Similarly, https://twitter.com/tjade273/status/1682009691633614849 claims, depending in what setting you test primality detection on 5-digit numbers properly, the June version is either significantly better or about the same.
> OpenAI’s APIs have *quietly* changed in quality a lot in the past few months"
The paper investigates the difference between gpt-4-0314 and gpt-4-0613. The old version is to be supported until at least June 2024. Every OpenAI developer got an email introducing the new version.
Interesting observation about Retentive Networks: https://twitter.com/ericzelikman/status/1682097753151660032?s=46&t=R1HcRy3wUpT5EYNQxGl8wg
They seem severely undertrained compared to other networks like Llama 2. Wondering if they just converge a little faster in the beginning of training and hence the favorable perf compared to regular Transformers.
There might be a few corrections to make on the "How is ChatGPT's behavior changing over time?" summary. Do not take this in bad faith, it's just that I read this newsletter and like it to stay true to the facts.
> In many cases, GPT-4 got worse while GPT-3.5 got better.
You might not be aware that claims of performance decreases seems to be misplaced, at least on the experiments investigated in the paper.
In particular, https://twitter.com/Si_Boehm/status/1681801371656536068 claims the LeetCode performance of the produced code got significantly better.
Similarly, https://twitter.com/tjade273/status/1682009691633614849 claims, depending in what setting you test primality detection on 5-digit numbers properly, the June version is either significantly better or about the same.
> OpenAI’s APIs have *quietly* changed in quality a lot in the past few months"
The paper investigates the difference between gpt-4-0314 and gpt-4-0613. The old version is to be supported until at least June 2024. Every OpenAI developer got an email introducing the new version.
The new llama-2.0 (meta/facebook) is even more woke than chatGPT ( open-ai/microsoft )
https://bilbobitch.substack.com/p/modern-ai-chat-bots-like-metas-llama
Everybody agrees that the chatGPT4 has gotten dumber in time.
Soon to be worthless.