Relation between decomposition of ELBO and Hamiltonian

Is there any relation between the two decompositions?

  1. Evidence lower bound that decomposes into Energy term and Neg-Entropy term.
-\mathrm{ELBO}(\boldsymbol{w})=\underset{\mathrm{z} \sim q_{w}}{\mathbb{E}}[-\log p(\mathrm{z}, \boldsymbol{x})]+\underset{\mathrm{z} \sim q_{w}}{\mathbb{E}}\left[\log q_{\boldsymbol{w}}(\mathrm{z})\right]
  1. Hamiltonian that decomposes into potential and kinetic energy
\begin{aligned} H(\rho, \theta) &=-\log p(\rho, \theta) \\ &=-\log p(\rho \mid \theta)-\log p(\theta) \\ &=T(\rho \mid \theta)+V(\theta) \end{aligned}

w, z, x are parameter, latent variable, observed data in the first equation from this paper.

\rho and \theta are auxiliary momentum variable and parameter from stan manual.

The structural similarity of R.H.S of both equations and the term ‘energy’ led me to be curious on their relations. Any opinion would be appreciated!

One more tangential question on latent parameter and variable. According to this, a parameter is fixed but unknown; would it be better to name w as a variable rather than parameter? Or since w is actually a function according to the following equation in this paper, it is fixed and parameter is correct?

\mathrm{ELBO}(q)=\mathbb{E}[\log p(\mathbf{z}, \mathbf{x})]-\mathbb{E}[\log q(\mathbf{z})]

I don’t know of a direct connection other than they’re trying to solve the same problem.

Probably the way to think about the differences is in how each approximates integrals. One uses a Monte Carlo method (HMC) and the other is based on a variational approximation.

For the Hamiltonian there, the likelihood and the prior both go into the potential energy. The kinetic energy is something else.

That seems like a specific assumption about the type of uncertainty in a problem. Like when we do stuff in Stan everything looks like a random variable, but is the actual variable a random thing? Or is it that we just don’t know it? I don’t really know this stuff, but I think different estimation techniques make different assumptions. Here’s the Wikipedia link on it: Uncertainty quantification - Wikipedia

I don’t think there’s any universal lingo on what is a parameter or what makes something latent.

1 Like