Scholar - SciOpen

Filter pruning effectively compresses the neural network by reducing both its parameters and computational cost. Existing pruning methods typically rely on pre-designed pruning criteria to measure filter importance and remove those deemed unimportant. However, different layers of the neural network exhibit varying filter distributions, making it inappropriate to implement the same pruning criterion for all layers. Additionally, some approaches apply different criteria from the set of pre-defined pruning rules for different layers, but the limited space leads to the difficulty of covering all layers. If criteria for all layers are manually designed, it is costly and difficult to generalize to other networks. To solve this problem, we present a novel neural network pruning method based on the Criterion Learner and Attention Distillation (CLAD). Specifically, CLAD develops a differentiable criterion learner, which is integrated into each layer of the network. The learner can automatically learn the appropriate pruning criterion according to the filter parameters of each layer, thus the requirement of manual design is eliminated. Furthermore, the criterion learner is trained end-to-end by the gradient optimization algorithm to achieve efficient pruning. In addition, attention distillation, which fully utilizes the knowledge of unpruned networks to guide the optimization of the learner and improve the pruned network performance, is introduced in the process of learner optimization. Experiments conducted on various datasets and networks demonstrate the effectiveness of the proposed method. Notably, CLAD reduces the FLOPs of ResNet-110 by about 53% on the CIFAR-10 dataset, while simultaneously improves the network’s accuracy by 0.05%. Moreover, it reduces the FLOPs of ResNet-50 by about 46% on the ImageNet-1K dataset, and maintains a top-1 accuracy of 75.45%.