# Defining the log_lik for a multi-level model

Hi all,

I am trying to specify multilevel model in which the first level estimates a species-specific slope and intercept for each of ~30 species, and the second level uses those species-specific coefficients as data (predictors in a new model). Thus, I have different numbers of data points for each model, and different response variables. I am having trouble defining the point-wise log likelihood in the transformed parameters block so it will work with the loo package.

Stan code:

``````data {
int<lower=1> N; //the number of observations
int<lower=1> J; //the number of groups
int<lower=1> K; //number of columns in the model matrix
int<lower=1,upper=J> id[N]; //vector of group indeces
matrix[N,K] X; //the model matrix
vector[N] LogGSAcm2; //the response variable model 1
vector[J] GP_int; // response variable model 22

}

parameters {
vector[K] gamma; //population-level regression coefficients
vector<lower=0>[K] tau; //the standard deviation of the regression coefficients

vector[K] beta[J]; //matrix of group-level regression coefficients
real<lower=0> sigma; //standard deviation of the individual observations

real aGP_int; // intercept of model 2
real bGP_int; // slope of model 2
real<lower=0> sigmaGP_int; // variance model 2

}

transformed parameters {

vector[N] mu_LogGSAcm2; //linear predictor model 1

vector[J] beta_ints; // vector of intercepts extract from first model
vector[J] muGP_int; // linear predictor model 2

for(n in 1:N){
mu_LogGSAcm2[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression coefficients
}

beta_ints = to_vector(beta[1:J,1]); // extract intercepts from first model

muGP_int = aGP_int + bGP_int * beta_ints; // compute linear predictor for model 2

}

model {
gamma ~ student_t(3, 0, 10); //prior on pop-level coefficients
tau ~ cauchy(0, 10); //prior on pop-level sd
sigma ~ gamma(2,0.1); //prior on group-level sd

for(j in 1:J){
beta[j] ~ normal(gamma,tau); //fill the matrix of group-level regression coefficients
}

LogGSAcm2 ~ normal(mu_LogGSAcm2,sigma);

aGP_int      ~ student_t(3, 0, 10); // prior on intercept model 2
bGP_int      ~ student_t(3, 0, 10); // prior on slope model 2
sigmaGP_int  ~ cauchy(0, 10); // prior on variance model 2

GP_int ~ normal(muGP_int, sigmaGP_int);

}

generated quantities {

vector[N] log_lik_N;
vector[J] log_lik_J;

for(i in 1:N)
log_lik_N[i] =
normal_lpdf(LogGSAcm2[i] | mu_LogGSAcm2[i], sigma);

for(i in 1:J)
log_lik_J[i] =
normal_lpdf(GP_int[i] | muGP_int[i], sigmaGP_int);
}

``````

The way it is specified right now runs but I cannot use loo directly on the model object. I have also tried:

``````generated quantities {

vector[J + N] log_lik;

for (i in 1:N){
log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
}

for(i in (N+1):(N+J)){
log_lik[i] =
normal_lpdf(GP_int[i] | muGP_int, sigmaGP_int);
}}
``````
``````generated quantities {

vector[J + N] log_lik;

for(i in 1:J + N){
log_lik[i] =
normal_lpdf(LogGSAcm2[i] | mu_LogGSAcm2[i], sigma) +
normal_lpdf(GP_int[i] | muGP_int[i], sigmaGP_int);

}
``````

Neither of the above two will run. Any thoughts appreciated! Thank you!

You had it nearly correct with:

``````generated quantities {
vector[J + N] log_lik;
for (i in 1:N){
log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
}
for(i in (N+1):(N+J)){
log_lik[i] = normal_lpdf(GP_int[i] | muGP_int, sigmaGP_int);
}
}
``````

except you’re indexing into `GP_int` incorrectly. It needs to be:

``````generated quantities {
vector[J + N] log_lik;
for (i in 1:N){
log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
}
for(i in (N+1):(N+J)){
log_lik[i] = normal_lpdf(GP_int[i-N] | muGP_int, sigmaGP_int);
}
}
}
``````

Thanks @mike-lawrence! I tried this and it still produced bad loo values, but what I needed was to build off of your suggestion and add the `[i] ` and `[N-i]` in the relavant places in the for loop.

So solution:

``````generated quantities {

vector[J + N] log_lik;

for (i in 1:N){

log_lik[i] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2[i], sigma);

}

for(i in (N+1):(N+J)){

log_lik[i] = normal_lpdf(GP_int[i-N] | muGP_int[i-N], sigmaGP_int);

}

}
``````

THANK YOU!

2 Likes

Ah, good catch! Sorry for my error.

Thanks – no worries, and thank you!

Sorry to bring this up again, but now I’m wondering how this would work if you had two models that had the same number of data points in each (and even data from the same groups perhaps). Would you index the two response variables differently and then apply a similar approach, for example (for brevity, just included the data & generated quantities blocks, but I can write out the full Stan program if that helps):

``````data {
int N; // sample size model 1
int J; // sample size model 2
vector[N] y1; // response 1
vector[N] x1; // predictor 1
vector[J] y2; // response 2
vector[J] x2; // predictor 2
}

generated quantities {

vector[N + J] log_lik;

for (i in 1:N){
log_lik[i ] = normal_lpdf(y1[i]| mu_y1[i], sigma_y1);
}

for(i in (N+1):(N+J)){
log_lik[i] =
normal_lpdf(y2[i-N] | mu_y2[i-N], sigma_y2);
}}

``````

Thanks!

Looks right to me!

1 Like