Introduction - Incomplete CS Notes @ Cornell

Three principles that allow us to scale Machine Learning:

Optimization: Write learning task as an optimization problem and solve it with fast and canned gradient-based algorithm using linear algebra.
Recall the perceptron: it was a prediction model + a specific learning algorithm to this model. Having a different learning algorithm for each model isn’t what we want. Optimization allows generalization.
Statistics: To process a large dataset, we can just process a small random subsample instead.
- Stochastic Gradient Descent: use a subset of loss to do GD
- cross-validation / train-validation-test split: use a subsample of whole dataset to represent the whole
Hardware: use algorithms that fit your hardware and use hardware that fits your algorithm
- request memory in a power of 2
- build TPU to accelerate computation