Re Generalized Expectation Maximization on page 7:
I’m glad you mentioned this. It’s a seemingly simple generalization of EM, to choose any \theta^{(t+1)} for which Q(\theta^{(t+1)}| \theta^{(t)}) > Q(\theta^{(t)}| \theta^{(t)}). In my experience, replacing the full maximization of the M-step with a single hill-climbing step is quite computationally efficient.
Re distribute derivatives through integrals on page 8:
Technically you need the dominated convergence theorem and linearity to hold in order to interchange integration (expectation) and differentiation. Though, I agree that most people just do it with a subtle remark about it being only conditionally possible.
Re Gradient Based Marginal Optimization on page 9:
Is there a missing negative sign on the right hand side of the covariance matrix?
There’s some minor typos that I’ll call out, if you want.
Much like drezap’s comment in this thread What doc would help developers of the Math library?, your blog style posts are great. Thanks!