Hi all,
I am trying to specify multilevel model in which the first level estimates a species-specific slope and intercept across species, and the second level uses those species-specific coefficients as data. Thus, I have different numbers of data points for each model. I am running into problems at the first level, where I am trying to estimate species-specific coefficients. I think it has something to do with the way I specify the ‘species’ predictor.
To specify the first level, I followed this post: https://stackoverflow.com/questions/29183577/how-to-represent-a-categorical-predictor-rstan, where I create a data matrix, but the issue is when I use loo() on the model object. No matter what I seem to do, all my pareto K values are “very bad”. I have pasted the code below:
data_matrix <- model.matrix(y1 ~ x * Species, data = df)
dat <- list(N = nrow(df),
K = ncol(data_matrix),
data_matrix = data_matrix,
y1 = df$y1,
y2 = unique(df$y2))
data {
int N; // number of observations
vector[N] y1; // response variable 1
vector[33] y2; // response variable 2
int K; // number of predictors
matrix[N, K] data_matrix; // matrix of predictors
}
parameters {
// level 1
vector[K] beta;
real<lower=0> sigma;
// level 2
real y2_int;
real y2_slope;
real<lower=0> sigma_y2;
}
transformed parameters {
vector[N] mu_y1;
real beta_y1_ref;
vector[33] beta_y1;
vector[37] mu_y2;
mu_y1 = data_matrix * beta;
beta_y1_ref = beta[1];
beta_y1[1] = beta_y1_ref;
beta_y1[2:33] = beta_y1_ref + beta[3:34];
mu_y2 = y2_int + y2_slope * beta_y1;
}
model {
beta ~ student_t(3, 0, 10);
sigma ~ cauchy(0, 10);
y1 ~ normal(mu_y1, sigma);
y2_int ~ student_t(1, 0, 10);
y2_slope ~ student_t(1, 0, 10);
sigma_y2 ~ cauchy(0, 10);
y2 ~ normal(mu_y2, sigma_y2);
}
generated quantities {
vector[33] log_lik;
for(i in 1:33){
log_lik[i] =
normal_lpdf(y1[i] | mu_y1, sigma) +
normal_lpdf(y2[i] | mu_y2, sigma_y2);
}
}
When I run the above model, the parameter estimates for the first level match that from lm() or brms(). However, if you do loo(model object), I get all pareto K as “very bad”. This happens even when running this model without the second level (y2 ~ etc). I am also not sure if I have specified the log_lik correctly (I have only specified multi-level models that have the same number of observations/levels previously).
Any help would be greatly appreciated. Thank you!