Model comparison via PSIS-LOO for 3-level hierarchical growth model

Dear all,

I am new to Stan and currently working with 3 level hierachical growth model (non-centered slope & intercept) to estimate plant growth (rate) in different chemical concentrations. The longitudinal data are obtained by repeated measurements (plant length over a 14 days) nested within plant that are nested within different laboratories, that in turn are nested within six different chemical concentrations. Data includes1512 observations (plant length) nested in 378 plants that are nested within 6 different laboratories in 6 chemical concentrations. In the modeling part, I would like to see if the plants growth are linear or exponential growth. Since it is easier to fit exponential model by taking logarithm at both sides and fitting with the existing linear model stan code, it’s what I started with. I am doing it with the code from the appendix in the paper, Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Included, with the Stan code :-

data {
int<lower=0> Nh; //no. of observations
int<lower=0> Ni; //no. of plants
int<lower=0> Nj; //no. pesticide concentrations of all laboratories
int<lower=0> Nk; //no. pesticide concentrations
int<lower=1> Plant4Day[Nh]; //level-2 cluster (Plant) for each observation
int<lower=1> ConcPerLab4Plant[Ni]; //level-3 cluster (concentrations of all labs) for each plant
int<lower=1> Conc4Lab[Nj]; //level-4 cluster (concentrations) for each lab
int X_hijk[Nh];
real Y_hijk[Nh];
}

parameters {
real beta_0k[Nk]; //Slope
real beta_1k[Nk]; //Intercept
real u_0jkm[Ni]; //random effects of slope at plant-level
real u_0jk[Nj]; //random effects of slope at laboratory-level
real u_1jkm[Ni]; //random effects of intercept at plant-level
real u_1jk[Nj]; //random effects of at laboratory-level
real<lower=0,upper=100> sigma_u0jkm;
real<lower=0,upper=100> sigma_u0jk;
real<lower=0,upper=100> sigma_u1jkm;
real<lower=0,upper=100> sigma_u1jk;
real<lower=0,upper=1> sigma_y;
}

transformed parameters {
real beta_0jkm[Ni];
real beta_0jk[Nj];
real beta_1jkm[Ni];
real beta_1jk[Nj];

for (j in 1:Nj) {
beta_0jk[j] = beta_0k[Conc4Lab[j]] + u_0jk[j];
beta_1jk[j] = beta_1k[Conc4Lab[j]] + u_1jk[j];
}
for (i in 1:Ni) {
beta_0jkm[i]=beta_0jk[ConcPerLab4Plant[i]]+u_0jkm[i];
beta_1jkm[i]=beta_1jk[ConcPerLab4Plant[i]]+u_1jkm[i];
}
}

model {

u_0jk ~ normal(0, sigma_u0jk);
u_0jkm ~ normal(0, sigma_u0jkm);
u_1jk ~ normal(0, sigma_u1jk);
u_1jkm ~ normal(0, sigma_u1jkm);

for (h in 1:Nh) {
Y_hijk[h] ~ normal(beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);
}
}

generated quantities {
vector[Nh] log_lik;

for (h in 1:Nh) {
log_lik[h] = normal_lpdf(Y_hijk[h] |beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);
}
}

However, I have a small query related to model comparison via PSIS-LOO . I’m hopping if someone can shed some light on the issue described as below:-

i) I have checked visually how well are the two models suit the data. At this moment I am interested at the first level fitting. When comparing linear model to exponential/log-linear model, the models on PSIS-LOO reveals an estimated difference in elpd of 3651.4. Positive in the estimated difference of elpd indicate the expected predictive accuracy for the second model is higher. Which means, exponential/log-linear model is preferred. However, when looking at the plots for how two models fit, the linear model seems to fit better than the exponential/log-linear model. Is that mean, I am actually not comparing apple to apple (due to the logarithm in the log-linear model) via PSIS-LOO? Or, did I make any mistake with the code. Note, the plot shown the model fitting of one lab. There are another 5 plots for the other labs which also look better fit in linear than exponential/log-linear model.
ii) In case PSIS-LOO isn’t the correct method here, any suggestion which model comparison method I could use?

Figure 1: linear growth model


Figure 2: log-linear/exponential growth model (logarithm at dependent variable)

Thanks in advance for any suggestions.

1 Like

Cool model.

This is correct. Both models need to use same y (ie no transformations) or you need to take into account the Jacobian of the transformation. You can add that (log) Jacobian term to your log_lik computation in the generated quantities.

2 Likes

Hey there,
I have tried with hierarchical exponential model. It’s more complicated(work) than I thought. So, I am thinking to add the log Jacobian term to the log_lik computation.I have replaced the below code for generated quantities.

generated quantities {
vector[Nh] log_lik;

for (h in 1:Nh) {
log_lik[h]= lognormal_lpdf(Y_hijk[h] |beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);
}
}

Pretty sure this is not the correct one. I have no idea how to add the (log) Jacobian term to my log_lik computation. Any further advice? Thanks!

Hi, sorry for the delay in the response. I wanted to be certain that I give the correct answer, and during StanCon and traveling it’s more difficult to focus on details like this.

If you have in model block

Y_hijk[h] ~ normal(beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);

and then you first had in generated quantities block

log_lik[h] = normal_lpdf(Y_hijk[h] |beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y);

If I understood correctly, your Y here is either growth rate or log of growth rate. In order to compare models for rate and log rate, when modeling log rate you need to include the Jacobian of the log transformation. For log(rate) the Jacobian is 1/rate. When working on density scale you would multiply by Jacobian. When you are working on log density scale you add log Jacobian, which is now log(1/rate)=-log(rate) and if your Y is log(rate), then you should have

log_lik[h] = normal_lpdf(Y_hijk[h] |beta_0jkm[Plant4Day[h]]* X_hijk[h] + beta_1jkm[Plant4Day[h]], sigma_y) - Y_hijk[h];

Alternatively you could have Y always to be rate and make another model with lognormal both in model and generated quantities block.