Hiroki Naganuma

Part I: General-knowledge ML Questions

A17_Q1と全く一緒な気がする

1.

Explain all relationships between the following notions: the bias-variance tradeoff, model capacity, number of training samples, overfitting, underfitting.

The bias-variance tradeoff indicates tradeoff between bias-error and variance error which corresponds model capacity each other.
The variance-error denotes that error between prediction and averaged prediction.
The bias-error denotes that error between averaged prediction and ground truth.

If we select to use small capacity model, this model doesn't have enough number of parameter to fit training samples because number of training samples is greater than the number of model parameters. In other words, it is called underfitting.
In contrast, if we select to use large capacity model, it can be overfit to training samples, because such a model can memorize training samples itself.

Screen Shot 2021-12-02 at 16 39 18

2.

Suppose you have a training dataset of n observations.

(a).

Give one (good) strategy to select the hyperparameters of your model.

If the model is a regular model, an information criterion such as AIC is effective way to select hyperparams.
AIC = Train_Loss + 2 * d
where d is the number of model parameters
The model with the higher criteria's hyperparameter is adopted.

As another method, if the data n is sufficiently large, the Hold Out method divides the data into three parts and prepares data for training (data used to update the model parameters), data for validation (data used for hyperparameter selection), and data for testing. 

Since the test data is not used for hyperparameter selection, the training data is used to update the model, and the validation data is used to determine whether the model's hyperparameters are good or bad.

(b).

Screen Shot 2021-12-02 at 15 44 09

If the σ of the SVM is small, the boundary surface is defined by the individual support vectors and is overfitted to the training samples.
Hyperparameters such as σ must be explored, e.g., by grid search, and performance evaluated using loss by validation data rather than training data used for model updating.
A rookie mistake is to choose a very small σ which leads to overfitting when selecting a sigma for evaluation using training data.

3.

Write the equation of the soft-argmax operation.1 Express the corresponding (log- likelihood based) loss incurred when the true class label index is y. Note: make sure your loss is non-negative!

Softmax関数は、ほとんどの教科書では数行で定義が書かれているだけの気がしますが、週末にきちんと説明を書いたら3ページくらいになりました。 pic.twitter.com/lDJECXUMjV

— Daichi Mochihashi (@daiti_m) October 31, 2021

4.

The straightforward equation for the soft-argmax can lead to numerical instability. Write an equivalent implementation of its computation that avoids numerical problems.

5.

Screenshot 2021-12-05 at 3 37 45 PM