Missing data in hierarchical linear model

bbecker · March 2, 2018, 7:27pm

Hi all,

I’m building a linear hierarchical model with missing data (see below). I modeled missing data for the dependent variable through indexing and combining the observed and missing components of the respective vector (following section 11.3 in the manual).

Did I implement this correctly? Would I model missing values for independent variables in the same way? Currently the model is not very fast, and more problematically, does not mix very well - any ideas on how to improve on this?

Any advice would be much appreciated!

data{
int N; //number of level 1-observations
int M; //number of level 2-observations
int K; //number of fixed level-1 predictors
int R; //number of random level-1 predictors 
int U; //number of level 2 predictors
matrix[N,K] X_f; //level-1 variables, fixed effects
matrix[N,R] X_r; //level-1 variables, random effects; including column of ones for random intercept
matrix[M,U] X2; //level-2 variables, including column of ones for grand intercept 
int g[N]; //grouping variable
int<lower = 0> nmis_y; // number of missing y
int<lower = 1, upper = N> ii_obs[N-nmis_y]; // index, observed y
int<lower = 1, upper = N> ii_mis[nmis_y]; // index, missing y
real y_obs[N-nmis_y]; // observed y's
}
parameters{
matrix[U,R] gamma; // level 2 predictors
corr_matrix[R] Omega; // correlation matrix
vector<lower=0>[R] sigma_l2; // variances, level 2
real<lower=0> sigma_l1; // variance, level 1
vector[K] beta_f; //level-1 predictors, fixed
vector[R] beta_r[M]; //level-1 predictors, random
real y_mis[nmis_y]; // missing y values
}
transformed parameters{
matrix[R,R] vcov_l2; // var-covariance matrix, level 2
real y[N]; // merged vector of observed and missing y
vector[N] xb_l1; // xbeta, level 1
vector[R] xb_l2[M]; // xbetas, level 2
// compute vcov values
vcov_l2 <- quad_form_diag(Omega, sigma_l2); 
// compute xbetas
for(n in 1:N) //level 1
  xb_l1[n] <- X_f[n]*beta_f+X_r[n]*beta_r[g[n]];
for (m in 1:M) //level 2
  xb_l2[m] <- (X2[m] * gamma)';
// merge observed and missing y
y[ii_obs] = y_obs;
y[ii_mis] = y_mis;
}
model{
sigma_l1 ~ cauchy(0,1); 
sigma_l2 ~ cauchy(0,1); 
Omega ~ lkj_corr(R); 
to_vector(gamma) ~ normal(0, 10); 
beta_f ~ normal(0,10); 
beta_r ~ multi_normal(xb_l2, vcov_l2); 
y ~ normal(xb_l1, sigma_l1);
}

Bob_Carpenter · March 4, 2018, 11:42pm

Indent your code—it’s much easier to read!

In terms of content, the approach you’re taking will work to inferring y[ii_mis], but it’s much more efficient given that the y[ii_mis] are conditionally independent here to just generate them in the generated quantities block using the RNG.

But the approach you take will generalize to missing predictors.

bbecker · March 5, 2018, 6:13am

Bob, thank you!

Actually, would the code also be more efficient if I do not store the y values? Can this be done by commenting it out {} within the generated quantities block?

bbecker · March 5, 2018, 11:48am

I can’t quite figure out how to implement your recommendation.

If I move y[ii_mis] = y_mis; to the generated quantities block I get the following error:

PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or a
or ‘}’ to close variable declarations and definitions>

If I also move the variable declaration, stan warns me that the model can’t be estimated because “y does not exist”.

Any ideas?

Bob_Carpenter · March 5, 2018, 6:11pm

If you don’t store them, why generate them?

If you attach your whole program, we can help debug syntax. Otherwise, it’s too hard to guess what might be going on. This looks like a syntax error rather than a conceptual one.

Nico-Rojas · December 3, 2019, 4:23pm

Hi @bbecker , by any chance, do you remember what did you do with this? I’m working on a similar problem.

Topic		Replies	Views
Subsetting missing data models for second-level predictor brms hierarchical-model , missing-data	0	394	February 22, 2023
Missing data in a 2PL (IRT) model Modeling	37	4289	October 22, 2017
My first (hierarchical) model, and it's not working Modeling fitting-issues	10	1030	July 30, 2018
Help to speed up a hierarchical mvn model Modeling	5	938	November 23, 2018
Improving speed in hierarchical VAR modelling Modeling	0	367	August 22, 2019

Missing data in hierarchical linear model

Related topics