Hello everyone,
I am trying to specify a Bayesian hierarchical model, with an additional “block” structure. That is, we have E different observed environments with with N^e \times 1 response vectors y^e and N^e \times D covariate matrix X^e for e = \{1,\dots,E\}. In addition, the data for each environment is divided into B blocks, wherein block b=0 all observations are intact and for the blocks, b>0 the covariate X_b is missing from the observations. The likelihood is modelled as a linear regression model, assuming for each block b normally distributed error term with scale sigma_b, that is
y^{e,b=0} = \sum_{d=1}^D \beta^e_dX_d^e + \epsilon_0, \text{ } \epsilon_0 \sim \mathcal{N}(0, \sigma_0)
y^{e,b>0} = \sum_{d=\{1,\dots, D\}\setminus\{b\}} \beta^e_dX_d^e + \epsilon_b, \text{ } \epsilon_b \sim \mathcal{N}(0, \sigma_b)
I have included the whole stan program below, where I input all data in one matrix and vectors “env” and “blck” that indicates respectively which environment and block an observation belongs to.
I need help specifying the likelihood when I want to disregard a covariate-column and corresponding regression coefficient in each of the blocks b>0. Right now I have written “-block[n]” which is just to illustrate what I want to do. Any ideas as to how it can be done would be much appreciated :)
Any other comments on how to make the program more efficient are also welcome!
Best regards,
Pernille
data {
int<lower=1> N // observations in total
int<lower=1> B; // number of blocks
int<lower=1> D; // number of covariates
int<lower=1> E; // number of environments
int<lower=1,upper=E> env[N]; // associated environment
int<lower=0,upper=B> blck[N]; // associated block
matrix[N,D] X; // covariate matrix
vector[N] y; // target vector
}
parameters {
matrix[D,E] gamma; // Non-centered coefficients
real mu[D]; // global means
real<lower=0> tau[D]; // global scales
real<lower=0> sigma[B]; // block scales
}
transformed parameters {
matrix[D,E] beta; // Recentered coefficients
for (d in 1:D){
for (i in 1:E){
beta[d,i] = mu[d] + tau[d] * gamma[d, i];
}
}
}
model {
for (d in 1:D) {
mu[d] ~ normal(0, 1);
tau[d] ~ cauchy(0, 1);
gamma[d, :] ~ std_normal();
}
for (b in 1:B) {
sigma ~ cauchy(0, 2.5);
}
for (n in 1:N) {
if (blck[n] == 0) {
y[n] ~ normal(X[n, :] * beta[:, env[n]], sigma[blck[n]]); // likelihood no missing data
}
else {
y[n] ~ normal(X[n, "-blck[n]"] * beta["-blck[n]", env[n]], sigma[blck[n]]); //likelihood missing data
}
}