Parameters that depend on missing values

I have this simple regression model
y_{ij} = \mu_{i} + x_{ij}^{T}\beta + e_{ij},

where e_{ij} \sim N(0, \sigma^2_{i}) and e_{ij}, e_{ij'} are independent. i indexes ‘regions’: i=1, 2, \ldots, K and j indexes instances in each region: j=1, 2, \ldots, N_{i}.

Sometimes i have only a sample of y_{ij}, say j=1, 2, ..., n_{i} and n_{i} << N_{i} (but i still knowing all the data in X). When n_{i} > 0, i could learn all \mu_{i} and \sigma^2_{i} from the data, and then predict missing y_{ij} with
p(y_{ij}^{\star} | y ) = \int p(y_{ij}^{\star} | \theta) \, p(\theta | y) \, d\theta,
provided that my model is valid for new observations. Here \theta=( \sigma^{2}_{1:K},\, \mu_{1:K},\, \beta).

In the other hand, i read in the Stan User’s Guide (ch. 3) that i can predict the missing y_{ij} declaring this quantity as a parameter, so i coded:

data {
  int<lower=1> K;  // regions
  int<lower=0> N_obs; // no. of observed y_{ij}
  int<lower=0> N_mis; // no. of missing  y_{ij}
  array[N_obs] int<lower=1, upper=K> group; // region of observed
  array[N_mis] int<lower=1, upper=K> groupmis; // region of missing
  int<lower=1> p;  // predictors
  vector[N_obs] y_obs;
  matrix[N_obs, p] X_obs;
  matrix[N_mis, p] X_mis;
}
parameters {
  vector[p] beta;              // Regression coefficients
  vector[K] mu;                // intercept for each region
  vector<lower=0>[K] sigma;    // sd for each region
  vector[N_mis] y_mis;         // missing y_{ij}
}
model {
  target +=normal_id_glm_lupdf(y_obs | X_obs, mu[group], beta, sigma[group]);
  target +=normal_id_glm_lupdf(y_mis | X_mis, mu[groupmis], beta, sigma[groupmis]);
}

Here group and groupmis are vectors with elements (1, 1, ..., 1, 2, 2, ..., 2, ..., K, K, ..., K), etc. So mu[group] means that i select the mu entries such that n_{i} > 0 and mu[groupmis] select the mu entries such that n_{i} = 0, for all i.

Im not very sure if this code is correct:

  1. Does sigma[groupmis] and mu[groupmis] actually learn about y_mis?, or the only information they get is by mu[group] and sigma[group]?

  2. When some n_{i}=0, say, if we were using a Gibbs sampler/Metropolis algorithm, we can generate the missing y_{ij} (ymis), then update mu[groupmis] and sigma[groupmis] and so on. I know that Stan is based on HMC/Nuts, so, is there any equivalent in Stan?, is the code above this equivalent?.

For sake of simplicity, i am using uniforms priors in sigma, mu and beta. Also, as initial values for ymis i put the OLS predictions. The following R code illustrate this

question-code.R (3.1 KB)

When n_{i}=0 for some region, the estimates for mu[i] and sigma[i] are bad, when n_{i}>0, the estimates are now much better.

Thanks in advance.

Judging from the stan code that you posted in-line, this is expected behaviour. When n_{i}=0, there is nothing to inform these group-specific parameter mu and sigma, not even a proper prior distribution. Because you have not explicitly specified a prior for mu and sigma, the implied default prior here is a uniform prior, which happens to be an improper prior for these two parameters. Given that you have no data to inform these parameters, the posterior for these parameters is also improper and therefore not estimatable.

If you would specify a prior for each nu and sigma that each integrates to 1 (area under the curve), you would find that mu and sigma for areas without any data would follow that prior. Still, I imagine that wouldn’t be very useful either.

Maybe what you’re looking for - but I’m not sure - is to learn something about the missing groups from the data that is available: pooling. For this, you have to make your model hierarchical, with the group-level parameters mu and sigma each following a hyperprior (or potentially even a single multivariate hyperprior). The values of the parameters of the hyperprior itself can be estimated from the data, provided that you have a sufficient number of groups.

2 Likes

One way to think about this is that you’re letting Stan’s MCMC do that integral for you. The marginalization is free once you’ve sampled by just dropping the “parameters” you don’t care about.

In many simple regressions, you won’t get additional information about the parameters by leaving out some observations. For example, if you have y_n \sim \textrm{normal}(\alpha + \beta \cdot x_n, \sigma), and some of your y_n are missing. You can code these up as parameters and infer their posterior distribution, but it won’t help estimate \alpha, \beta, \sigma.

Yes, you just declare them as missing as you have done. Stan will update them all jointly because it’s using HMC, but it’s just a different sampler and the effect is the same for using Gibbs.

2 Likes