Neural Network Review - Incomplete CS Notes @ Cornell

In a deep neural network, we have all those non linear layers and they contribute to the unconvexity of the loss function. In the optimization regime, when we have some problem as unconvex, we regard it as hard to solve. However in practice, the convex optimization algorithms we have discussed so far do perform well on Neural Networks.

NN tends not to overfit. We have this nice property of double descent that after the regime of overfit, if we continue growing the model size to a super huge model, the testing error will actually go down again (as we see in GPT-3 level model)

CS4787 Principles of Large-Scale Machine Learning

Sparsity and Dimension Reduction

CS4787 Principles of Large-Scale Machine Learning

Accelerate DNN Training