⭐ Efficient DNN Training with Knowledge-Guided Layer Freezing They claim 19-43% training speedup at iso accuracy via layer freezing. The freezing is guided by using a small proxy model on the CPU that helps estimate each layer's plasticity. They cache the activations for frozen layers to avoid even doing the forward pass through them, and report results on ResNet-50, BERT, and other useful networks.
2022-1-23: DeepSpeed-MoE, Guided layer freezing, Low-pass filtering SGD
2022-1-23: DeepSpeed-MoE, Guided layer…
2022-1-23: DeepSpeed-MoE, Guided layer freezing, Low-pass filtering SGD
⭐ Efficient DNN Training with Knowledge-Guided Layer Freezing They claim 19-43% training speedup at iso accuracy via layer freezing. The freezing is guided by using a small proxy model on the CPU that helps estimate each layer's plasticity. They cache the activations for frozen layers to avoid even doing the forward pass through them, and report results on ResNet-50, BERT, and other useful networks.