Defining the log_lik for a multi-level model

bigmanj · December 15, 2020, 10:48pm

Hi all,

I am trying to specify multilevel model in which the first level estimates a species-specific slope and intercept for each of ~30 species, and the second level uses those species-specific coefficients as data (predictors in a new model). Thus, I have different numbers of data points for each model, and different response variables. I am having trouble defining the point-wise log likelihood in the transformed parameters block so it will work with the loo package.

Stan code:

data {
  int<lower=1> N; //the number of observations
  int<lower=1> J; //the number of groups
  int<lower=1> K; //number of columns in the model matrix
  int<lower=1,upper=J> id[N]; //vector of group indeces
  matrix[N,K] X; //the model matrix
  vector[N] LogGSAcm2; //the response variable model 1
  vector[J] GP_int; // response variable model 22

  }

parameters {
  vector[K] gamma; //population-level regression coefficients
  vector<lower=0>[K] tau; //the standard deviation of the regression coefficients

  vector[K] beta[J]; //matrix of group-level regression coefficients
  real<lower=0> sigma; //standard deviation of the individual observations
  
  real aGP_int; // intercept of model 2
  real bGP_int; // slope of model 2
  real<lower=0> sigmaGP_int; // variance model 2
  
  }

transformed parameters {

  vector[N] mu_LogGSAcm2; //linear predictor model 1

  vector[J] beta_ints; // vector of intercepts extract from first model
  vector[J] muGP_int; // linear predictor model 2
  
  for(n in 1:N){
    mu_LogGSAcm2[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression coefficients 
  }
  
   beta_ints = to_vector(beta[1:J,1]); // extract intercepts from first model
 
   muGP_int = aGP_int + bGP_int * beta_ints; // compute linear predictor for model 2

  }
  
model {
  gamma ~ student_t(3, 0, 10); //prior on pop-level coefficients
  tau ~ cauchy(0, 10); //prior on pop-level sd
  sigma ~ gamma(2,0.1); //prior on group-level sd
  
  for(j in 1:J){
   beta[j] ~ normal(gamma,tau); //fill the matrix of group-level regression coefficients 
  }
  
  LogGSAcm2 ~ normal(mu_LogGSAcm2,sigma);
  
  aGP_int      ~ student_t(3, 0, 10); // prior on intercept model 2
  bGP_int      ~ student_t(3, 0, 10); // prior on slope model 2
  sigmaGP_int  ~ cauchy(0, 10); // prior on variance model 2
  
  GP_int ~ normal(muGP_int, sigmaGP_int); 
  
  }

generated quantities {

vector[N] log_lik_N;
vector[J] log_lik_J;

for(i in 1:N)
  log_lik_N[i] =
      normal_lpdf(LogGSAcm2[i] | mu_LogGSAcm2[i], sigma);


for(i in 1:J)
  log_lik_J[i] =
      normal_lpdf(GP_int[i] | muGP_int[i], sigmaGP_int);
}

The way it is specified right now runs but I cannot use loo directly on the model object. I have also tried:

generated quantities {

vector[J + N] log_lik;

for (i in 1:N){
log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
}

for(i in (N+1):(N+J)){
  log_lik[i] =
      normal_lpdf(GP_int[i] | muGP_int, sigmaGP_int);
      }}

generated quantities {

vector[J + N] log_lik;

for(i in 1:J + N){
  log_lik[i] =
      normal_lpdf(LogGSAcm2[i] | mu_LogGSAcm2[i], sigma) +
      normal_lpdf(GP_int[i] | muGP_int[i], sigmaGP_int);
      
      }

Neither of the above two will run. Any thoughts appreciated! Thank you!

mike-lawrence · December 16, 2020, 5:07pm

You had it nearly correct with:

generated quantities {
  vector[J + N] log_lik;
  for (i in 1:N){
    log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
  }
  for(i in (N+1):(N+J)){
    log_lik[i] = normal_lpdf(GP_int[i] | muGP_int, sigmaGP_int);
  }
}

except you’re indexing into GP_int incorrectly. It needs to be:

generated quantities {
  vector[J + N] log_lik;
  for (i in 1:N){
    log_lik[i ] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2, sigma);
  }
  for(i in (N+1):(N+J)){
    log_lik[i] = normal_lpdf(GP_int[i-N] | muGP_int, sigmaGP_int);
  }
}
}

bigmanj · December 16, 2020, 7:44pm

Thanks @mike-lawrence! I tried this and it still produced bad loo values, but what I needed was to build off of your suggestion and add the [i] and [N-i] in the relavant places in the for loop.

So solution:

generated quantities {

  vector[J + N] log_lik;

  for (i in 1:N){

    log_lik[i] = normal_lpdf(LogGSAcm2[i]| mu_LogGSAcm2[i], sigma);

  }

  for(i in (N+1):(N+J)){

    log_lik[i] = normal_lpdf(GP_int[i-N] | muGP_int[i-N], sigmaGP_int);

  }

}

THANK YOU!

mike-lawrence · December 16, 2020, 8:41pm

Ah, good catch! Sorry for my error.

bigmanj · December 16, 2020, 9:54pm

Thanks – no worries, and thank you!

bigmanj · December 17, 2020, 4:46pm

Sorry to bring this up again, but now I’m wondering how this would work if you had two models that had the same number of data points in each (and even data from the same groups perhaps). Would you index the two response variables differently and then apply a similar approach, for example (for brevity, just included the data & generated quantities blocks, but I can write out the full Stan program if that helps):

data {
int N; // sample size model 1
int J; // sample size model 2
vector[N] y1; // response 1 
vector[N] x1; // predictor 1 
vector[J] y2; // response 2 
vector[J] x2; // predictor 2
}

generated quantities {

vector[N + J] log_lik;

for (i in 1:N){
log_lik[i ] = normal_lpdf(y1[i]| mu_y1[i], sigma_y1);
}

for(i in (N+1):(N+J)){
  log_lik[i] =
      normal_lpdf(y2[i-N] | mu_y2[i-N], sigma_y2);
      }}

Thanks!

mike-lawrence · December 17, 2020, 5:47pm

Looks right to me!

Topic		Replies	Views
Calculating the pointwise log-likelihood in a two-staged multilevel model Modeling loo , stan	1	366	December 13, 2023
Fit statistics — troubles calculating log likelihood? Modeling specification , loo	6	2058	October 16, 2017
How to calculate log_lik in generated quantities of a multivariate regression model CmdStan cmdstan , loo	15	3760	March 9, 2022
How to calculate log_lik in generated quantities of two different process(models) running simultaneously in a longitudinal data Modeling rstan , loo	12	589	April 18, 2024
Simple Linear Regression with Categorical Predictor (species) Modeling specification , loo	3	470	December 12, 2020

Defining the log_lik for a multi-level model

Related topics