Skip to article frontmatterSkip to article content

Introduction

Cornell University

Three principles that allow us to scale Machine Learning:

  1. Optimization: Write learning task as an optimization problem and solve it with fast and canned gradient-based algorithm using linear algebra.

    Recall the perceptron: it was a prediction model + a specific learning algorithm to this model. Having a different learning algorithm for each model isn’t what we want. Optimization allows generalization.

  2. Statistics: To process a large dataset, we can just process a small random subsample instead.

    • Stochastic Gradient Descent: use a subset of loss to do GD
    • cross-validation / train-validation-test split: use a subsample of whole dataset to represent the whole
  3. Hardware: use algorithms that fit your hardware and use hardware that fits your algorithm

    • request memory in a power of 2
    • build TPU to accelerate computation