1 Comment

The Wide Attention paper is misleading. They only perform experiments on the tasks where even a bag-of-words model gets good performance. I got much worse performance when I tried this on other tasks.

Expand full comment