Discussion about this post

User's avatar
Anand's avatar

"General-Purpose In-Context Learning by Meta-Learning Transformers" is a very cool paper. In-context learning by meta-learning is not really new, having been shown for recurrent networks back in 2001 (Hochreiter et al. 2001). This paper is to some extent just replacing a recurrent network by a transformer in the setup of (Hochreiter et al. 2001) and (Wang et al. 2016; Duan et al. 2016). But since transformers scale so nicely compared to RNNs, guess it's possible to get a lot more interesting behaviour more easily. On the other hand, I would expect transformers have a limit in terms of context length that RNNs don't (at least in principle).

Expand full comment

No posts