Do We Really Need a Learnable Classifier at the End of Deep Neural Network? I really like this because they did a thing I thought about a long time ago, but couldn’t figure out the math for; namely, constructing a matrix whose maximum cosine similarity between columns is minimized. Instead of having a trainable softmax classifier at the end, they just use such a matrix. Works worse for class balanced problems but often better for small, imbalanced ones. They also replace cross-entropy with a different loss based on cosine similarity. I wonder if we could get it to train faster by solving the procrustes problem to get an initial U matrix that lines up better with the initial average embedding for each class. I’ve been wanting to have a fixed final layer to make classifying each pixel cheaper in segmentation, and this seems like the most promising initialization to make that happen. For more on the math, see this
2022-3-20: Memorizing transformers, {knn, non-trainable} softmax, reducing flipping errors
2022-3-20: Memorizing transformers, {knn…
2022-3-20: Memorizing transformers, {knn, non-trainable} softmax, reducing flipping errors
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? I really like this because they did a thing I thought about a long time ago, but couldn’t figure out the math for; namely, constructing a matrix whose maximum cosine similarity between columns is minimized. Instead of having a trainable softmax classifier at the end, they just use such a matrix. Works worse for class balanced problems but often better for small, imbalanced ones. They also replace cross-entropy with a different loss based on cosine similarity. I wonder if we could get it to train faster by solving the procrustes problem to get an initial U matrix that lines up better with the initial average embedding for each class. I’ve been wanting to have a fixed final layer to make classifying each pixel cheaper in segmentation, and this seems like the most promising initialization to make that happen. For more on the math, see this