Beyond supervised learning - Incomplete CS Notes @ Cornell

Data Augmentation¶

When we don’t have enough data to make the model robust, sometimes we do data augmentation.

It’s mostly widely used in CV, where we do Translation, Shift, Rotation, Scaling, Recoloring. These are just transforming these problems with group actions.
In NLP, we can do synonym swapping
Also we do deletion in NLP and cropping in CV, these are like randomly projecting our data to a lower dimension.
Adding Noise is also an approach

You can think of most data augmentation as adding a soft regularization term to the ERM goal so the model we trained can be invariant to some group action (shift, rotation, ...)

Semi-Supervised Learning¶

This is when we have a lot of unlabeled data and only some labeled data. There are several ways to get around this and we make different assumptions for each:

Smoothness assumption: similar points have similar labels (if two points are close, their label is also close; this is also the assumption of KNN) we can just assign labels to each data point using KNN
Clustering assumption: the data is split into clusters and if two points are in the same cluster, their label is the same. We run K-means or other clustering algorithm and assign labels according to clusters
Manifold assumption: there exists a low-dimensional manifold / curve such that all data points approximately lie on that surface We run some dimension reduction algorithm on the whole dataset (PCA or encoder part of autoencoder) and learn an algorithm only on the labeled data. That is, we first try to obtain a lower-dim representation of the whole dataset and hope the labeled data alone is now enough to learn a good model on the reduced dimension.

Weak Supervision Learning¶

Labelling is either too expensive or requires too much expertise. So we can only get labels that are noisy or imprecise through

Crowdsourcing from non-experts
Data programming: use functions / models / external databases to heuristically label examples

Reference: http://ai.stanford.edu/blog/weak-supervision/

Self-Supervised Learning¶

Extract a supervision label from the unlabeled data itself.

For example, we have a “fill-in-the-blank” setting for computer vision: Take a image. Remove patches from the image. Train a DNN to recover the original image from the version with the patches removed.

CS4787 Principles of Large-Scale Machine Learning

Accelerate DNN Training

CS4787 Principles of Large-Scale Machine Learning

Attention, Transformers, and Transfer Learning