Hiroki Naganuma

1.

Suppose you have a training dataset of n observations.

(a).

Give one (good) strategy to select the hyperparameters of your model.

if given statistical model is regular model, by using AIC or TIC to estimate the generalization performance, we can chose better hyperparameter setting.

As another method, at first you divide training datasets of n to 3 datasets as follows:
1. training dataset for updating model parameter itself
2. validation dataset for evaluate model selection performance (evaliuate how good of hypoerparameter setting)
3. test dataset for evaluating model prediction performance

(b).

Screenshot 2021-12-11 at 4 17 12 PM

When the σ of an SVM with a Gaussian kernel becomes small, the individual support vectors are overfitting the data as they determine the separating boundaries.
As a rookie mistake, if an SVM with a radial basis function kernel (RBF kernel) is trained on all the data without regularization, σ will be very small and the SVM will be overfitted.

2.

Write the equation of the soft-argmax operation.1 Express the corresponding (log- likelihood based) loss incurred when the true class label index is y. Note: make sure your loss is non-negative!

3.

The straightforward equation for the soft-argmax can lead to numerical instability. Write an equivalent implementation of its computation that avoids numerical problems.

4.

Screen Shot 2021-12-13 at 12 31 48

5.

(a). directed graph

Consider a generic directed graphical model on binary random variables. How can you generate a sample from the joint distribution? Is this an exact or approximate sampling scheme?

Screenshot 2021-12-25 at 9 01 51 AM

事前知識

問題解く上での参考

(b). undirected graph

Consider a generic undirected graphical model on binary random variables. How can you generate a sample from the joint distribution? Is this an exact or approx- imate sampling scheme?

Approximate Sampling Scheme, bacause it is like a Gibbs Sampling.
Sampling X1  X2 in consequent、since P(X1, X2) != P(X1)(X2), it should be considered as an Approximate Sampling Scheme.

もっと言うと、有向グラフだと、P(X|Y), P(Y|X)のどちらかがわかっているけど、無向グラフだとどっちの影響かわからない、情報が少ない状態になっている。
P(X|Y1) で X1 を生成して、 P(Y|X) でY1 を生成するようなプロセス、つまり Gibbs Sampling などを用いる

(c).

(BONUS) In the case an approximate sampling scheme was mentioned above, give a precise quantity (hint: think linear algebra) that quantifies the quality of the sampling approximation.

Coming soon....