Hi,
I am a relatively new Stan user. I was wondering whether Stan supports models with mixed deterministic and random variables, either directly or via a good hack. To be more specific, let’s say my log joint is a function of $\theta$ and $\psi$, and I want to know the MAP over $\theta$ while also knowing the posterior distribution of $\psi$. In the ADVI sense, I want to be able to maximize $ELBO(\theta, \xi_\psi, \sigma_\psi)$, where $\theta$ is left as is and $(\xi_\psi, \sigma_\psi)$ denote the mean and std of the approximate posterior of $\psi$ in the unconstrained space.
Background: I have a model with mixed continuous and discrete latents. The discrete latents appear in a Markov chain so that integrating them out is intractable (i.e. the number of paths is exponential in the chain length). As an approximation, I thought I’d directly specify an ELBO under the assumption that the posterior is factorized over each node of the chain. The ELBO will be a functional of the posterior distribution of each discrete variable, as well as the rest of the continuous variables.
In the spirit of ADVI, one could simply optimize the “hard-coded” ELBO over an unconstrained parametrization of the simplexes, and the unconstrained parameters of the Gaussians that approximate the posterior of the continuous variables. However, if one implements this naively in Stan, the simplexes will not be treated deterministically; rather, they will also be represented by yet more Gaussians in an unconstrained space. Am I approaching this problem in a wrong manner?
ps> in other words, can I somehow mark some variables (let’s say unconstrained) in my model parameters to be treated as hyperparameters? from a variational inference perspective, this seems like a natural thing to do. From the sampling perspective, however, I’m not sure what this means.
pps> perhaps my best shot is to break down the problem into two steps: (1) a forward-backward step on the chain given the posterior of the continuous latents, and (2) optimizing the posterior of the continuous variables given the posterior of the discrete variables. Alternating between these two steps yields the best posterior factorized over continuous and discrete sectors. My original question still holds though: why can’t we simply extend ADVI to treat discrete variables? a discrete distribution defined on a finite set of size $S$ can be represented uniquely with $S-1$ unconstrained real variables and these can be taken as variational parameters.