Topic: Principal Component Analysis
Given samples x1, x2, . . . xn, from some p-dimensional distribution, i.e. xi ∈ Rp, how do you compute the empirical (data) covariance matrix, Σ?
Definition of Centered Covariance as follows:
s.t. Matrix size is p x p
Show that the empirical covariance matrix is symmetric and positive semi-definite (PSD).
Cov(X) = E[(X - E(X))(X - E(X))^{T}]
と Cov(X, Y) = E[(X - E(X))(Y - E(Y))^{T}]
をそれぞれ、covariance matrix、variance-covariance matrix というCov(X1, X2)
みたいなことになってるので、分散共分散行列と言ったりもする(曖昧)Figure 1 shows samples drawn from Gaussians with different covariance matrices. Match the four sets of samples (1)-(4) to the population covariance matrices they correspond to (A to D).
(1): C
(2): A
(3): D
(4): B
What objective does PCA optimize?
[a1a2]T
を求めるPCA is a dimensionality reduction method that aims to represent p-dimensional data with as few dimensions as possible.
Specifically, it calculates the principal component (the axis on which the variance of the data is maximized when the data is projected to that axis).
This principal component corresponds to the eigenvector of the variance-covariance matrix.
The optimization problem to be solved by PCA is a constrained maximization problem, where the variance of the first principal component is maximized, and subsequent principal components are chosen to maximize the variance under the constraint that they are orthogonal to the previously determined principal components.
What is the true principal component of each distribution above? Use the population covariance matrices to come up with this answer. Your answer in each case will be one or more vectors representing the principal component; the vectors need not be normalized.
Assuming that p-dimensional vector w_0 follow a Gaussian distribution N(0, 1/p), square 2-norm is the square root of the square of the vector components.
y = {|w_0_0|}^{2} + .... + {|w_0_p|}^{2} where w_0_i is scalar value of w_0's element.
The sum of squares of the normal distribution (in this case, the sum of squares of p random variables) follows the chi-square distribution with p degrees of freedom (calculated by replacing the probability density function with variable \hat{y} = yp)
\hat{y} follows the chi-square distribution with p degrees of freedom, and the expected value of \hat{y} is p. (We can prove this using the properties of the gamma function.)
Finally, using the equation for \hat{y} = yp, if we calculate it for y, the expected value is 1, and the square root is 1.
For an arbitrary, unit-length vector, v, show that E[hw0, vi2] = 1/p. Note: the statements in Questions 6 and 7, hint to the fact that random initialization typically distributes w0’s energy roughly evenly along all directions.
Let U = [v1, v2, . . . , vp] denote the orthonormal matrix of eigenvectors of Σ. Show how you can use U to do a change of basis, in order to decompose the vector recurrence of (2) into the following scalar dynamics: