How to disregard a covariate-column and corresponding regression coefficient for the observations where that covariate is missing?

Hello everyone,

I am trying to specify a Bayesian hierarchical model, with an additional “block” structure. That is, we have E different observed environments with with N^e \times 1 response vectors y^e and N^e \times D covariate matrix X^e for e = \{1,\dots,E\}. In addition, the data for each environment is divided into B blocks, wherein block b=0 all observations are intact and for the blocks, b>0 the covariate X_b is missing from the observations. The likelihood is modelled as a linear regression model, assuming for each block b normally distributed error term with scale sigma_b, that is

y^{e,b=0} = \sum_{d=1}^D \beta^e_dX_d^e + \epsilon_0, \text{ } \epsilon_0 \sim \mathcal{N}(0, \sigma_0)

y^{e,b>0} = \sum_{d=\{1,\dots, D\}\setminus\{b\}} \beta^e_dX_d^e + \epsilon_b, \text{ } \epsilon_b \sim \mathcal{N}(0, \sigma_b)

I have included the whole stan program below, where I input all data in one matrix and vectors “env” and “blck” that indicates respectively which environment and block an observation belongs to.
I need help specifying the likelihood when I want to disregard a covariate-column and corresponding regression coefficient in each of the blocks b>0. Right now I have written “-block[n]” which is just to illustrate what I want to do. Any ideas as to how it can be done would be much appreciated :)

Any other comments on how to make the program more efficient are also welcome!

Best regards,
Pernille

data {
    int<lower=1> N                              // observations in total
    int<lower=1> B;                             // number of blocks
    int<lower=1> D;                             // number of covariates
    int<lower=1> E;                             // number of environments
    int<lower=1,upper=E> env[N];      // associated environment
    int<lower=0,upper=B> blck[N];     // associated block    
    matrix[N,D] X;                               // covariate matrix
    vector[N] y;                                   // target vector
}
parameters {
    matrix[D,E] gamma;                     // Non-centered coefficients   
    real mu[D];                                   // global means
    real<lower=0> tau[D];                  // global scales
    real<lower=0> sigma[B];             // block scales
}
transformed parameters {
  matrix[D,E] beta;                          // Recentered coefficients             
  for (d in 1:D){
    for (i in 1:E){
      beta[d,i] = mu[d] + tau[d] * gamma[d, i];
    }
  }
}

model {
  for (d in 1:D) {
    mu[d] ~ normal(0, 1);                
    tau[d] ~ cauchy(0, 1);                   
    gamma[d, :] ~ std_normal();            
  }  
  for (b in 1:B) {
    sigma ~ cauchy(0, 2.5);
  }
  for (n in 1:N) {
    if (blck[n] == 0) { 
      y[n] ~ normal(X[n, :] * beta[:,  env[n]],  sigma[blck[n]]); // likelihood no missing data
    }
    else { 
      y[n] ~ normal(X[n,  "-blck[n]"] * beta["-blck[n]",  env[n]],  sigma[blck[n]]); //likelihood missing data
    }
}

Best to do explicit inference on the missing values. See the SUG here and video here.