Hello. I have a conceptual question.
I have a hierarchical Bayesian model that looks like this:
\alpha_i \sim \mathcal{N}\left(\mu_\alpha, \sigma_\alpha\right) \tag{1}
\beta_i \sim \mathcal{N}\left(\mu_\beta, \sigma_\beta\right) \tag{2}
\gamma_i \sim \mathcal{N}\left(\mu_\gamma, \sigma_\gamma\right) \tag{3}
So, my model has lower-level parameters \alpha_i, \beta_i, \gamma_i for each participant i. Also, there are upper level parameters like \mu_\alpha \ldots \sigma_\gamma.
Assume I have specified an observation model so that I can fit these parameters with the NUTS sampler (let’s say 4 chains, 1000 posterior samples each).
After fitting the model, I will get 4000 posterior samples for each parameter, let’s denote them by \alpha_i^{\text{est}}, \beta_i^{\text{est}}, \gamma_i^{\text{est}}, \mu_\alpha^{\text{est}}, \ldots \sigma_\gamma^{\text{est}}.
Now, I want to simulate lower-level parameters using the upper-level parameters, and check if the simulated parameters are “close” to the estimated ones.
The way I have done this is described below:
-
Use \mu_\alpha^{\text{est}}, \ldots \sigma_\gamma^{\text{est}} in (1) -(3) to sample lower-level parameters, let’s denote them by \alpha_i^{\text{sim}}, \beta_i^{\text{sim}},\gamma_i^{\text{sim}}.
-
Now, we can use PCA to visualize these parameters on a cartesian plane. We can fit the PCA map on \alpha_i^{\text{est}}, \beta_i^{\text{est}}, \gamma_i^{\text{est}} which would be fit on an array of size (4000, 3). After fitting the PCA map, we can use it to reduce the dimension to (4000, 2).
-
We can also use the fitted PCA map to reduce the dimension of simulated parameters \alpha_i^{\text{sim}}, \beta_i^{\text{sim}},\gamma_i^{\text{sim}} from (4000, 3) to (4000, 2).
-
Now, we can plot both of these and see if there’s an overlap.
I want to know if this is an ok thing to do or am I violating any assumptions?
Note that, I can also plot the prior parameters by simply sampling from (1)-(3) and then using the fitted PCA map to reduce them. Ideally, on the PCA plot, the estimated and simulated parameters would have a large overlap and they will both be within the area colored by prior parameters.
I want to know if this a reasonable way to assess the quality of simulated data. Any suggestions or guidance will be highly appreciated. Thanks!