Dear all,
I am new to Stan and currently working with 3 level hierachical growth model (non-centered slope & intercept) to estimate plant growth (rate) in different chemical concentrations. The longitudinal data are obtained by repeated measurements (plant length over a 14 days) nested within plant that are nested within different laboratories, that in turn are nested within six different chemical concentrations. Data includes1512 observations (plant length) nested in 378 plants that are nested within 6 different laboratories in 6 chemical concentrations. In the modeling part, I would like to see if the plants growth are linear or exponential growth. Since it is easier to fit exponential model by taking logarithm at both sides and fitting with the existing linear model stan code, it’s what I started with. I am doing it with the code from the appendix in the paper, Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Included, with the Stan code :-
data {
int<lower=0> Nh; //no. of observations
int<lower=0> Ni; //no. of plants
int<lower=0> Nj; //no. pesticide concentrations of all laboratories
int<lower=0> Nk; //no. pesticide concentrations
int<lower=1> Plant4Day[Nh]; //level-2 cluster (Plant) for each observation
int<lower=1> ConcPerLab4Plant[Ni]; //level-3 cluster (concentrations of all labs) for each plant
int<lower=1> Conc4Lab[Nj]; //level-4 cluster (concentrations) for each lab
int X_hijk[Nh];
real Y_hijk[Nh];
}
parameters {
real beta_0k[Nk]; //Slope
real beta_1k[Nk]; //Intercept
real u_0jkm[Ni]; //random effects of slope at plant-level
real u_0jk[Nj]; //random effects of slope at laboratory-level
real u_1jkm[Ni]; //random effects of intercept at plant-level
real u_1jk[Nj]; //random effects of at laboratory-level
real<lower=0,upper=100> sigma_u0jkm;
real<lower=0,upper=100> sigma_u0jk;
real<lower=0,upper=100> sigma_u1jkm;
real<lower=0,upper=100> sigma_u1jk;
real<lower=0,upper=1> sigma_y;
}
transformed parameters {
real beta_0jkm[Ni];
real beta_0jk[Nj];
real beta_1jkm[Ni];
real beta_1jk[Nj];
for (j in 1:Nj) {
beta_0jk[j] = beta_0k[Conc4Lab[j]] + u_0jk[j];
beta_1jk[j] = beta_1k[Conc4Lab[j]] + u_1jk[j];
}
for (i in 1:Ni) {
beta_0jkm[i]=beta_0jk[ConcPerLab4Plant[i]]+u_0jkm[i];
beta_1jkm[i]=beta_1jk[ConcPerLab4Plant[i]]+u_1jkm[i];
}
}
model {
u_0jk ~ normal(0, sigma_u0jk);
u_0jkm ~ normal(0, sigma_u0jkm);
u_1jk ~ normal(0, sigma_u1jk);
u_1jkm ~ normal(0, sigma_u1jkm);
for (h in 1:Nh) {
Y_hijk[h] ~ normal(beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);
}
}
generated quantities {
vector[Nh] log_lik;
for (h in 1:Nh) {
log_lik[h] = normal_lpdf(Y_hijk[h] |beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);
}
}
However, I have a small query related to model comparison via PSIS-LOO . I’m hopping if someone can shed some light on the issue described as below:-
i) I have checked visually how well are the two models suit the data. At this moment I am interested at the first level fitting. When comparing linear model to exponential/log-linear model, the models on PSIS-LOO reveals an estimated difference in elpd of 3651.4. Positive in the estimated difference of elpd indicate the expected predictive accuracy for the second model is higher. Which means, exponential/log-linear model is preferred. However, when looking at the plots for how two models fit, the linear model seems to fit better than the exponential/log-linear model. Is that mean, I am actually not comparing apple to apple (due to the logarithm in the log-linear model) via PSIS-LOO? Or, did I make any mistake with the code. Note, the plot shown the model fitting of one lab. There are another 5 plots for the other labs which also look better fit in linear than exponential/log-linear model.
ii) In case PSIS-LOO isn’t the correct method here, any suggestion which model comparison method I could use?
Figure 1: linear growth model
Figure 2: log-linear/exponential growth model (logarithm at dependent variable)
Thanks in advance for any suggestions.