Hiroki Naganuma

Please give succinct but precise answers.

1.

Standard neural networks and directed probabilistic graphical models are both typically represented using directed graphs. But what nodes and arcs represent differs. Explain these differences.

A probabilistic graphical model is a graphical representation of a probability distribution, 
a network that describes causal relationships by nodes.
The purpose of NN is to learn the relationship between input x and output y conditional on the model parameters Θ, 
and only point estimation is possible.
NN differs from graphical model definitely in that it is deterministic, i.e. not stochastic.

Screenshot 2021-10-17 at 7 46 21 PM

翻訳がちょっと変だけど、

Screenshot 2021-11-04 at 1 01 15 PM

Screenshot 2021-11-05 at 6 48 40 PM

2

Explain the differences between the following approaches (inductive principles) a) maximum likelihood, b) maximum `a posteriori (MAP); c) complete Bayesian treatment; when learning and using a parameterized model for prediction (e.g. for a classification problem).

Maximum likelihood estimation uses the likelihood function as the most plausibility of the data when the parameter Θ is given, and estimates Θ as a point by maximizing the likelihood. (A method of determining the parameter Θ so that the data at hand, Xn, is most likely to be generated.)
MAP estimation is a method of point estimating Θ that maximizes the posterior probability, which is proportional to the product of the likelihood function and the prior distribution given to maximum likelihood estimation, considering Θ itself as a random variable.
When there is not enough data, prior knowledge can be used to prevent bias.
Bayesian estimation is a method that considers the posterior distribution in the same way as MAP estimation, and determines Θ by calculating the expected value of the posterior probability instead of point estimation.

3

The most common neural network training procedure typically corresponds to either a maximum likelihood or a MAP optimization. Explain in what sense it can correspond to MAP (give a specific example).

Maximum likelihood estimation by training data (e.g., using cross-entropy loss, L2 regularization, by backpropagation), 
assuming that the initial values of the parameters θ follow a certain distribution with regularization, is MAP estimation.

20160424054316

4

Formalize mathematically the architecture of a basic deterministic auto-encoder (AE). Clearly define notations and variables used and what each represents. Express mathematically the optimization problem that is solved during its training.

5

Similarly, formalize the architecture and training criterion being optimized in a variational auto-encoder (VAE).