EvoGrad: Efficient Gradient-Based Hyperparameter Optimization
Published
Neural Information Processing Systems (NeurIPS)
Abstract
Gradient-based meta-learning and hyperparameter optimization have seen tremendous progress recently, allowing practical end-to-end training of neural networks
together with hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives of the loss with respect
to model parameters and hyperparameters. This cost prevents scaling these methods to large network architectures and larger numbers of hyperparameters. We
present EvoGrad, a new approach to hyper-gradient calculation that is inspired by evolutionary methods, but retains the efficacy of gradient-based methods. Crucially,
our approach provides a way to avoid the calculation of higher-order gradients, leading to significant improvements in memory and time efficiency. In practice,
EvoGrad enables various existing meta-learning frameworks to scale to bigger CNN architectures than was previously practical.