Interpretation question -- evidence synthesis + hierarchical model


A question about interpretation - predictive distribution versus missing data. Assume a series of exchangeable experiments (w/ data y_1 and n_1 below) inform a parameter (the logit-probability mu_1). In another series of exchangeable experiments, another (different) parameter (the logit-probability mu_2) can be estimated from data ( y_2 and n_2).

A third(!) series of exchangeable experiments were performed where only data on a function of those two initial parameters are quantified but quantities directly informing the initial parameters themselves weren’t. I use data from 3rd series of experiments to improve estimation of the two first quantities, given finite number of experiments (in an evidence synthesis context).

My question: the synthesis of these different datasets requires predicting the unobserved quantities (ytilde_1 and ytilde_2) for the 3rd set of experiments where these quantities are not observed (but a function of their values is). The sampling of ytilde_1 and ytilde_2 in the model block here – can it be interpreted as posterior predictive simulations (even if not in generated quantities block and not sampling using _rng) or would the correct interpretation be as missing data? (noting the information flow from ytilde_1 and ytilde_2 to other parameters - which is the goal and clear from running the model). It would be great to know your thoughts (sorry for long - possibly unclear - description)

int N_1; // number of experiments - informing parameter 1
int N_2; // number of experiments - informing parameter 2
int N_3; // number of experiments - informing combination of parameters 1 and 2
int y_1[N_1];
int y_2[N_2];
int n_1[N_1];
int n_2[N_2];
int n_3[N_3];
int y_3[N_3];

real mu_1;
real mu_2;
real<lower=0> tau_1;
real<lower=0> tau_2;
vector[N_1] mu_study_1;
vector[N_2] mu_study_2;
vector[N_3] ytilde_1;
vector[N_3] ytilde_2;

transformed parameters{
vector[N_3] p_3 = inv_logit(ytilde_1).*inv_logit(ytilde_2); // Function of quantities derived from two initial parameters

// Priors
mu_1 ~ normal(0, 5);
mu_2 ~ normal(0, 5);
tau_1 ~ normal(0, 1);
tau_2 ~ normal(0, 1);

//Hierarchical Priors
mu_study_1 ~ normal(mu_1, tau_1);
mu_study_2 ~ normal(mu_2, tau_2);

// Likelihood
y_1 ~ binomial_logit(n_1, mu_study_1);
y_2 ~ binomial_logit(n_2, mu_study_2);

// Predicting unobservable values of first two parameters in 3rd set of experiments
ytilde_1 ~ normal(mu_1, tau_1);
ytilde_2 ~ normal(mu_2, tau_2);

// Likelihood (3rd set)
y_3 ~ binomial(n_3, p_3);


There’s not always a precise delineation between missing data and posterior predictive quantities – everything’s a parameter in the joint density (model) a priori, and certain quantities get observed, and thus conditioned on, to obtain the posterior.

In a DAG representation of this model \tilde{y}_{1} and \tilde{y}_{2} have the same parents as the observed quantities y_{1}, y_{2}, which suggests thinking about them like either missing data or posterior predictive quantities. However, \tilde{y}_{1} and \tilde{y}_{2} have stochastic children, which means you might like to think about as parameters.

I would think about \tilde{y}_{1} and \tilde{y}_{2} as parameters in your model that let you include y_{3}, but are not directly of interest. If we could integrate them out, we would.

Thanks hhau – good to know that the separation between the two concepts is generally not very precise.
Just to check something - you mentioned if we could integrate them out, we would – would that be primarily for algorithm efficiency?

(and we can’t right? only marginalise out discrete parameters in Stan – is that correct?)


Marginalisation as a technique is not limited to discrete parameters, whether or not we can in this instance depends on if we can do the integration analytically (I haven’t the math to see if it is possible).

That’s clear – thank you!