Thanks to both of you for your good answers @matt-graham and @Bob_Carpenter. Here are some mathematical details that I hope are sufficient to explain what Iām aiming for.
I try to implement an extension of dynamic structural equation models to a situation with non-Gaussian responses. Consider the following setting: we have N subjects i=1,\dots,N for each of whom we have T observations at equally spaced timepoints t=1,\dots,T, where T is typically at least in the hundreds. For each subject-timepoint we observe a vector of responses y_{it} \in \mathbb{R}^{V}. Let us assume all elements of the response vector follow the same exponential family distribution, so we have for the $v$th component,
f(y_{itv}) = \exp\left\{ \frac{y_{itv}\theta(\mu_{itv}) - b(\theta(\mu_{itv}))}{\phi} + c(y_{itv},\phi) \right\}
where
\mu_{itv} = g^{-1}(\nu_{v} + \Lambda_{t} \eta_{i,t}).
In the above equation, \eta_{i,t} \in \mathbb{R}^{U}, with U < V is a vector of latent variables. \Lambda_{t} is an appropriately sized matrix of factor loadings. g^{-1}(\cdot) is an inverse link function, which for this case very well can be a canonical link. So the point of this model is to represent the high-dimensional responses y_{it}, which might come from any exponential family distribution, by a lower-dimensional latent variable vector \eta_{i,t}, which is Gaussian.
Next, the latent variables have an autoregressive dynamic, and are given by
\eta_{i,t} = \alpha + \sum_{l=0}^{L} B_{l} \eta_{i,t-l} +\Gamma_{l} X_{i,t-l} + \xi_{it}
where \xi_{it} \sim N(0,\sigma^{2}), \alpha \in \mathbb{R}^{U} is a vector of intercepts, X_{i,t-l} is a matrix of covariates, \Gamma_{l} is a matrix of regression coefficients, and B_{l} is a properly sized matrix, where B_{0} is strictly upper-triangular to ensure that the model is recursive.
The case is now that we have ātop-levelā parameters \Psi = \{\Lambda_{t},\nu_{v},B_{l},\Gamma,\alpha,\sigma^{2},\phi\}, which are quite easy to sample using Stan, given that I define decent priors. These parameters, which govern the time series dynamics, can also be made individual-specific, e.g., \Lambda_{i,t} if I want the factor loadings to vary between individuals. To avoid sample \eta_{1:T,1:N} for all timepoints and all subjects, I would be happy by just integrate them out, to get
p(\Psi | y) = \int p(\eta_{1:T,1:N}, \Psi | y_{1:T,1:N}) d \eta_{1:T,1:N}
which I guess can also be written
p(\Psi | y) = p(\Psi) \int p(y_{1:T,1:N} | \eta_{1:T,1:N}, \Psi) p(\eta_{1:N} | \Psi) d\eta_{1:T,1:N}
If the inverse link function g^{-1}(\cdot) above was the identity function, and hence y_{1:T,1:N} were Gaussian variables, the integral can be computed exactly. I imagine this could be done pretty easily with the function gaussian_dlm_obs()
, although I havenāt actually tried this. Presumably this would also let me analytically compute the predicted values of all latent variables at all timepoints, in cases I needed them.
My question, however, is for the case in which g^{-1}(\cdot) is not the identity function, e.g., with logistic or Poisson regression. Based on reading Andrieu et al. (2010), I realized that running particle filters could be a way to compute this integral inside MCMC while still target the correct posterior. In particle marginal Metropolis-Hastings, a particle filter estimate of the marginal likelihood is used in the Metropolis-Hastings ratio to accept or reject proposals for the parameters \Psi.
My question was hence related to whether this would make sense with Stan. Based on @matt-grahamās answer, it does seem to me that this, to say the least, would not be straightforward.