Abstract: The remarkable ability of modern neural networks to generalize improves with increasing network capacity, even when the number of model parameters or effective degrees of freedom exceeds the number of training data points. This phenomenon is all the more surprising given that generalization error diverges when the number of model parameters approaches a critical value from below. Here we use dynamical mean field theory to show, in a simple setting of linear regression, that this so-called “double descent” behavior is the outcome of a phase transition in the stochastic field theory describing the training process. We calculate the critical exponents and scaling function of the double descent phase transition, and show that it is marked by a breakdown of the fluctuation-dissipation theorem associated with broken ergodicity. The corresponding response function has the same functional form as the simple London model of the superconducting transition, with the rigidity of the wave function corresponding to the neural network’s ability to generalize accurately. Our results are distinct from earlier work, because we calculate the time-dependence specifically, not just the equilibrium solutions. This is what enables us to identify the origin of the emergent behavior.
- This event has passed.


