Overcoming Oscillations in Quantization-Aware Training Two changes to raw STE quantization: 1) regularization term to try to dampen oscillations in quantized weights; 2) "If the oscillation frequency of any weight exceeds a threshold f, that weight gets frozen until the end of training. We apply the freezing in the integer domain, such that potential change in the scales during optimization does not lead to a different rounding.” The damping seems to help quite a bit (reproducing a 4-bit quantization paper we saw submitted to ICLR a while ago). The freezing helps similarly. Doesn’t seem as effective as some other papers that got iso accuracy on ImageNet, but reproduction of oscillatory behavior hurting performance is a useful datapoint.
2022-3-27: Pathways, token dropping, decoupled mixup
2022-3-27: Pathways, token dropping…
2022-3-27: Pathways, token dropping, decoupled mixup
Overcoming Oscillations in Quantization-Aware Training Two changes to raw STE quantization: 1) regularization term to try to dampen oscillations in quantized weights; 2) "If the oscillation frequency of any weight exceeds a threshold f, that weight gets frozen until the end of training. We apply the freezing in the integer domain, such that potential change in the scales during optimization does not lead to a different rounding.” The damping seems to help quite a bit (reproducing a 4-bit quantization paper we saw submitted to ICLR a while ago). The freezing helps similarly. Doesn’t seem as effective as some other papers that got iso accuracy on ImageNet, but reproduction of oscillatory behavior hurting performance is a useful datapoint.