Midterm Review - Incomplete CS Notes @ Cornell

It is more likely that there is a linear separating hyperplane if the data is high dimensional. So linear classifier usually performs well on high dimensional data.
If the Naive Bayes assumption holds, the Naive Bayes classifier becomes identical to the Bayes Optimal classifier:
$BO=argmax_y \; P(Y|X) = argmax_y \; \frac{P(X|Y)P(Y)}{P(X)} = argmax_y \; P(X|Y)P(Y) = argmax_y \; P(Y) \prod_{\alpha=1}^d P([x]_\alpha|Y) = NB$
(1)
The KNN algorithm can be used for classification, but not regression. False KNN can be used for regression by averaging the labels of the k nearest neighbors.
The Bayes optimal error is the best classification error you could get if there was no noise. False It is the best classification error you could get if you knew the data distribution. In fact, this error is due to label uncertainty, i.e. noise.
As your training data set size, n, approaches infinity, the k−nearest neighbor classifier is guaranteed to have an error no worse than twice the Bayes optimal error. True This is true for both 1nn and knn (k>1).
As the validation set becomes extremely large, the validation error approaches the test error. True

Not midterm point, but: if we remove all the activation function in a neural network (multiple-layer perceptron), the NN/MLP is simply equivalent to a linear regression model.

CS4780 Intro to Machine Learning

Empirical Risk Minimization

CS4780 Intro to Machine Learning

Bias-Variance Tradeoff