Neural Network Review
Cornell University
In a deep neural network, we have all those non linear layers and they contribute to the unconvexity of the loss function. In the optimization regime, when we have some problem as unconvex, we regard it as hard to solve. However in practice, the convex optimization algorithms we have discussed so far do perform well on Neural Networks.
NN tends not to overfit. We have this nice property of double descent that after the regime of overfit, if we continue growing the model size to a super huge model, the testing error will actually go down again (as we see in GPT-3 level model)