Part IV: RNNs and CNNs
RNNs normally process an input sequence x = (x1, . . . , xT ) from the beginning (t = 1) to the end (t = T), meaning that their state variable ht at position t is a function of all the past inputs (x1, . . . , xt).
(a) What kind of architecture could exploit the RNN machinery to build a state vari- able ht which is a function of the whole sequence, i.e., both past and future from the point of view of position t?
(b) Explain how to do back-propagation in such an architecture.
(c) Explain what it is that in principle allows RNNs to generalize to sequence lengths not seen during training, contrasting that with the situation of MLPs.
(d) Consider whether CNNs have the same capability (handling variable-size inputs). How can CNNs process inputs of different sizes? What do CNNs and RNNs have in common which makes this possible?
(e) With both RNNs and CNNs we can also map a variable-length input to a fixed-size vector. What architectural device(s) can you think of which makes this possible?