Hiroki Naganuma

Topic: Principal Component Analysis

Part I: PCA

1.

Given samples x1, x2, . . . xn, from some p-dimensional distribution, i.e. xi ∈ Rp, how do you compute the empirical (data) covariance matrix, Σ?

統数研の資料
- PCA: 射影したデータの分散が最大となるような軸を探す
- 主成分を構成する係数は、分散共分散行列（相関行列）の固有ベクトルを求める手続きにより得られる
CalTech の資料: Covariance and Principal Component Analysis
Covariance Matrix そして Eigenvectors & Eigenvalues

Definition of Centered Covariance as follows:
s.t. Matrix size is p x p

2.

Show that the empirical covariance matrix is symmetric and positive semi-definite (PSD).

分散共分散行列(と相関行列)は半正定値であることを証明する
京大加納先生のテキスト
共分散行列と分散共分散行列
Cov(X) = E[(X - E(X))(X - E(X))^{T}] と Cov(X, Y) = E[(X - E(X))(Y - E(Y))^{T}] をそれぞれ、covariance matrix、variance-covariance matrix という
X がベクトルだったりすると、成分で見ると、Cov(X1, X2) みたいなことになってるので、分散共分散行列と言ったりもする（曖昧）
手書きメモ as Answer

3.

Figure 1 shows samples drawn from Gaussians with different covariance matrices. Match the four sets of samples (1)-(4) to the population covariance matrices they correspond to (A to D).

Screenshot 2021-11-10 at 4 39 12 PM

手書きメモ

(1): C
(2): A
(3): D
(4): B

4.

What objective does PCA optimize?

10分でわかる主成分分析(PCA)
はじめよう多変量解析～主成分分析編～: 分散の最大化の話が固有値問題に落ちるのを p55 で説明
PCA(主成分分析)を理解する(理論編)
- 解くべきタスクに置き換えると、「zzがもとのデータ情報をできるだけ多く有する」→「データ全体のバラツキをできるだけzzに反映させる」→「zzの分散を最大化する」ということになります。
- 制約付きの最大化問題の定石として、ラグランジュ未定乗数法を用いる
- 主成分の分散の値は説明変数の分散共分散行列の固有値λの値と一致
- 主成分の分散が最大の時の係数a1,a2求める
- 最大固有値λに属する固有ベクトル[a1a2]Tを求める
主成分分析とは何なのか、とにかく全力でわかりやすく解説する
- 主成分分析は特徴量の次元削減に使われる手法
- 次元数Mの特徴量の場合、主成分分析は第1～第M主成分まで求められる
- 分散共分散行列をいろいろやると最終的に固有値問題にいきつく

PCA is a dimensionality reduction method that aims to represent p-dimensional data with as few dimensions as possible.
Specifically, it calculates the principal component (the axis on which the variance of the data is maximized when the data is projected to that axis).
This principal component corresponds to the eigenvector of the variance-covariance matrix.

The optimization problem to be solved by PCA is a constrained maximization problem, where the variance of the first principal component is maximized, and subsequent principal components are chosen to maximize the variance under the constraint that they are orthogonal to the previously determined principal components.

5.

What is the true principal component of each distribution above? Use the population covariance matrices to come up with this answer. Your answer in each case will be one or more vectors representing the principal component; the vectors need not be normalized.

Part II: The power method

6.

Screenshot 2021-11-12 at 2 34 08 PM

東京大学情報基盤センター/冪乗法
Rayleigh quotient
筑波大: 行列の固有値計算(べき乗法,ハウスホルダー変換,QR法)が一番わかりやすかった
square 2-norm
標準正規分布の二乗がカイ二乗分布
正規分布の二乗和がカイ二乗分布に従うことの証明: 学部の授業以来に証明やった
- p 次元ベクトルが、ガウス分布に従うと仮定、square 2-norm は、ベクトル成分の二乗の平方根
- 正規分布の二乗和(n個足す場合)が、自由度 n のカイ二乗分布に従う
- 自由度 n のカイ二乗分布の期待値は n となる (ガンマ関数の性質を使えば証明できる)

Assuming that p-dimensional vector w_0 follow a Gaussian distribution N(0, 1/p), square 2-norm is the square root of the square of the vector components.
y = {|w_0_0|}^{2} + .... + {|w_0_p|}^{2} where w_0_i is scalar value of w_0's element.

The sum of squares of the normal distribution (in this case, the sum of squares of p random variables) follows the chi-square distribution with p degrees of freedom (calculated by replacing the probability density function with variable \hat{y} = yp)

\hat{y} follows the chi-square distribution with p degrees of freedom, and the expected value of \hat{y} is p. (We can prove this using the properties of the gamma function.)

Finally, using the equation for \hat{y} = yp, if we calculate it for y, the expected value is 1, and the square root is 1.

手書きメモ as Answer

7.

For an arbitrary, unit-length vector, v, show that E[hw0, vi2] = 1/p. Note: the statements in Questions 6 and 7, hint to the fact that random initialization typically distributes w0’s energy roughly evenly along all directions.

8.

Let U = [v1, v2, . . . , vp] denote the orthonormal matrix of eigenvectors of Σ. Show how you can use U to do a change of basis, in order to decompose the vector recurrence of (2) into the following scalar dynamics:

9.

Screenshot 2021-11-24 at 10 46 36 AM

10.

Screenshot 2021-11-24 at 10 46 41 AM

手書きメモ as Answer