Hiroki Naganuma

1.

Suppose you have a training dataset of n observations.

(a).

Give one (good) strategy to select the hyperparameters of your model.

if given statistical model is regular model, by using AIC or TIC to estimate the generalization performance, we can chose better hyperparameter setting.

As another method, at first you divide training datasets of n to 3 datasets as follows:
1. training dataset for updating model parameter itself
2. validation dataset for evaluate model selection performance (evaliuate how good of hypoerparameter setting)
3. test dataset for evaluating model prediction performance

(b).

Screenshot 2021-12-11 at 4 17 12 PM

When the σ of an SVM with a Gaussian kernel becomes small, the individual support vectors are overfitting the data as they determine the separating boundaries.
As a rookie mistake, if an SVM with a radial basis function kernel (RBF kernel) is trained on all the data without regularization, σ will be very small and the SVM will be overfitted.

2.

Write the equation of the soft-argmax operation.1 Express the corresponding (log- likelihood based) loss incurred when the true class label index is y. Note: make sure your loss is non-negative!

Answer

3.

The straightforward equation for the soft-argmax can lead to numerical instability. Write an equivalent implementation of its computation that avoids numerical problems.

4.

Screen Shot 2021-12-13 at 12 31 48

Answer

5.

(a). directed graph

Consider a generic directed graphical model on binary random variables. How can you generate a sample from the joint distribution? Is this an exact or approximate sampling scheme?

Screenshot 2021-12-25 at 9 01 51 AM

事前知識

問題解く上での参考

サンプリングによる近似ベイズ推論その2（MCMC:メトロポリス法）
- 事後確率分布p(Θ/D)は確率モデル（パラメータの同時分布）に基づいて確率の基本性質に従うことで、導出できます（ベイズの公式）。しかし、複雑なモデルであったり、p(D/Θ),p(Θ)が共役関係にない場合には、p(Θ/D)を解析的に求めることができないのでした。
- 近似推論が必要になります
サンプリングによる近似ベイズ推論その1（モンテカルロ積分, 棄却サンプリング）
誰でもわかるマルコフ連鎖モンテカルロ法（MCMC）入門
Answer: Exact Sampling Scheme

Rejection Sampling (知りたい条件付き確率の要件を見たさなくなったら途中でそのSampleのSampling停止)とか Likelihood Weighting（詳細は動画を参考に）などで、Sampling を効率化する方法もある。

(b). undirected graph

Consider a generic undirected graphical model on binary random variables. How can you generate a sample from the joint distribution? Is this an exact or approx- imate sampling scheme?

Approximate Sampling Scheme, bacause it is like a Gibbs Sampling.
Sampling X1  X2 in consequent、since P(X1, X2) != P(X1)(X2), it should be considered as an Approximate Sampling Scheme.

もっと言うと、有向グラフだと、P(X|Y), P(Y|X)のどちらかがわかっているけど、無向グラフだとどっちの影響かわからない、情報が少ない状態になっている。
P(X|Y1) で X1 を生成して、 P(Y|X) でY1 を生成するようなプロセス、つまり Gibbs Sampling などを用いる

(c).

(BONUS) In the case an approximate sampling scheme was mentioned above, give a precise quantity (hint: think linear algebra) that quantifies the quality of the sampling approximation.

Coming soon....