Q: An application of Newton’s method for unconstrained minimization leads to updates of the form xk+1 = xk − H−1∇l(xk) where H is the hessian of l at xk. Ignoring the issue of computing the inverse, explain how you can combine the two modes of automatic differentiation to compute Hv efficiently for any vector v?
Assume N vertices/nodes, and let’s explore building up a DAG with maximum edges. Consider any given node, say N1. The maximum # of nodes it can point to, or edges, at this early stage is N-1. Let’s choose a second node N2: it can point to all nodes except itself and N1 - that’s N-2 additional edges. Continue for remaining nodes, each can point to one less edge than the node before. The last node can point to 0 other nodes.
Sum of all edges: (N-1) + (N-2) + .. + 1 + 0 == (N-1)(N)/2