Write down the log-likelihood objective.
Show that maximizing this likelihood objective is equivalent to minimizing the KL divergence to the sampled data, DKL(pˆdata(x) ∥ pmodel(x; θ)).