Information criteria for multivariate response model


#1

I have a multilevel model with multivariate gaussian responses. I started off by using brms to make stancode and then I tweaked the code to accommodate an unconventional covariance structure in the residual correlations.

What I need is to calculate information criteria (probably looic) from the estimated model, but whereas brms seems to do this post-estimation, my inclination is to estimate the log_lik with a for loop in the “generated quantities” block. This thread provides a little insight, but I haven’t solved this yet.

Can someone point me in the right direction?

model { 
  vector[N] mu_Y1 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y2 = rep_vector(0, N) + response_2_Intercept; 
  vector[N] mu_Y3 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y4 = rep_vector(0, N) + response_2_Intercept; 
  
  // 
  vector[nresp] Mu[N]; 
  vector[nresp] sigma = [sigma_response_1, sigma_response_2, sigma_response_1, sigma_response_2]';
  
  // 
  matrix[nresp, nresp] LSigma = diag_pre_multiply(sigma, Lrescor); 
  for (n in 1:N) { 
    mu_Y1[n] += v_ij_ID [i_ID[n] , 1 ] + v_ij_ID [j_ID[n] , 3 ];
    mu_Y2[n] += v_ij_ID [i_ID[n] , 2 ] + v_ij_ID [j_ID[n] , 4 ];
    mu_Y3[n] += v_ij_ID [i_ID[n] , 3 ] + v_ij_ID [j_ID[n] , 1 ];
    mu_Y4[n] += v_ij_ID [i_ID[n] , 4 ] + v_ij_ID [j_ID[n] , 2 ];
    Mu[n] = [mu_Y1[n], mu_Y2[n], mu_Y3[n], mu_Y4[n]]';
  } 

  
  // priors including all constants 
  target += student_t_lpdf(response_1_Intercept | 3, 5, 10); 
  target += student_t_lpdf(sigma_response_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(response_2_Intercept | 3, 6, 10); 
  target += student_t_lpdf(sigma_response_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 

  target += lkj_corr_cholesky_lpdf(Lrescor | 1); 
  target += student_t_lpdf (sigma_ij_ID | 3, 0, 10)
    - 4 * student_t_lccdf (0 | 3, 0, 10); 
  target += normal_lpdf (to_vector(z_ij_ID) | 0, 1);
  target += lkj_corr_cholesky_lpdf (L_Rho_ij_ID | 1); 

  // likelihood including all constants 
  { 
    target += multi_normal_cholesky_lpdf(Y | Mu, LSigma); 
  } 
} 


#2

Basically, you just evaluate the loglikelihood of the parameters at each observation. But in your case, it seems as if the observations are not conditionally independent. @avehtari has been working on ELPD for GPs and other things that have a multivariate structure. You may just be able to do \mathbf{L}^{-1}\left(\mathbf{y} - \boldsymbol{\mu}\right).


#3

Thanks, Ben. It might be worth noting the data are dyadic, and so although there are four equations, in a way there are only two responses (hence the shared intercept parameters across the equations). The model is organized in this way because if we estimate the ij relations separately from the ji relations, then we can examine the residual correlations.

I hadn’t contemplated the implications in terms conditional non-independence. I’ll be curious to hear what Aki and others have to say.


#4

Could you post the whole code? Now the dimensions of data are not clear.

There is a vignette Leave-one-out cross-validation for non-factorizable models, which may help.


#5

Sure thing. Many thanks, Aki.

I’ll start with the simulation code:

## Simulating data for multiple dyadic responses
library(mvtnorm)
library(MASS)
library(rstan)
library(rethinking) ## Useful in simulation code for coerce_index function to generate integers as identifiers for units.
## rethinking package available here: https://xcelab.net/rm/software/

nodes <- data.frame(row.names = 1:300)
nodes$i_ID <- seq(1:300)
nodes$j_ID <- nodes$i_ID
nodes$k_ID <- rep(1:15, each = 20)
d.full <- expand.grid(nodes$i_ID, nodes$j_ID)
colnames(d.full) <-  c("i_ID", "j_ID")
d.nodes <- subset (d.full, i_ID != j_ID)
d.nodes$k_ID <- nodes$k_ID [ match (d.nodes$i_ID, nodes$i_ID)]
d.nodes$l_ID <- nodes$k_ID [ match (d.nodes$j_ID, nodes$j_ID)]
d <- subset (d.nodes, k_ID == l_ID)
d <- subset (d, select = -(l_ID))
d$first.node <- ifelse( d$i_ID < d$j_ID, paste (d$i_ID), paste(d$j_ID))
d$second.node <- ifelse( d$i_ID > d$j_ID, paste (d$i_ID), paste(d$j_ID))
d$ij_ID <- paste(d$first.node, "_", d$second.node, sep = "")
	#d$ij_dyad_ID <- coerce_index (d$ij_ID)
d <- subset (d, select = -c(first.node, second.node))
d$ij_dyad_ID <- match (d$ij_ID, unique(d$ij_ID))
d <- subset (d, select = -(ij_ID))
d <- subset (d, i_ID < j_ID)
d <- d[order(d$i_ID),]
row.names(d) <- d$ij_dyad_ID

## Prepping node-level random effects
N_ij_ID <- max(d$i_ID, d$j_ID) 

s_ij_i1 <- 1 ## Variance of i_ID, response 1
s_ij_i2 <- 1 ## Variance of i_ID, response 2
s_ij_j1 <- 1 ## Variance of j_ID, response 1
s_ij_j2 <- 1 ## Variance of j_ID, response 2
r_i1_i2 <- 0.5 * s_ij_i1 * s_ij_i2
r_i1_j1 <- -0.5 * s_ij_i1 * s_ij_j1
r_i1_j2 <- 0.5 * s_ij_i1 * s_ij_j2
r_i2_j1 <- -0.5 * s_ij_i2 * s_ij_j1
r_i2_j2 <- 0.5 * s_ij_i2 * s_ij_j2
r_j1_j2 <- -0.5 * s_ij_j1 * s_ij_j2

SIGMA_ij_ID <- matrix(NA, nrow = 4, ncol = 4)
SIGMA_ij_ID[1,] <- c(s_ij_i1, r_i1_i2, r_i1_j1, r_i1_j2)
SIGMA_ij_ID[2,] <- c(r_i1_i2, s_ij_i2, r_i2_j1, r_i2_j2)
SIGMA_ij_ID[3,] <- c(r_i1_j1, r_i2_j1, s_ij_j1, r_j1_j2)
SIGMA_ij_ID[4,] <- c(r_i1_j2, r_i2_j2, r_j1_j2, s_ij_j2)
v_ij_ID <- rmvnorm ( N_ij_ID, mean = rep (0,4), sigma = SIGMA_ij_ID)

## Prepping dyad-level random effects -- that is, the residuals
N <- nrow(d) 

SIGMA <- matrix (NA, nrow = 4, ncol = 4)

    # covariance matrix is defined by:
    # [ s1 c12 r1  r12 ]
    # [    s2  r12 r2  ]
    # [        s1  c12 ]
    # [            s2  ]

s1 <- 1
s2 <- 1
c12 <- s1 * s2 * -0.5
r1 <- s1 * s1 * 0.5
r2 <- s2 * s2 * 0.5
r12 <- s1 * s2 * -0.5

SIGMA[1,] <- c(s1, c12, r1, r12)
SIGMA[2,] <- c(c12, s2, r12, r2)
SIGMA[3,] <- c(r1, r12, s1, c12)
SIGMA[4,] <- c(r12, r2, c12, s2)

v_e <- rmvnorm(N, mean = c(0,0,0,0), sigma = SIGMA) 

## Set intercept
a1 <- 5
a2 <- 6

## Make i to j response, outcome 1
Y1 <- rep (NA, N)
for ( i in 1:N ) {
Y1[i] <- with ( d ,
	{ a1 +
		v_ij_ID [i_ID[i] , 1 ] +
		v_ij_ID [j_ID[i] , 3 ] +
		v_e [ ij_dyad_ID[i] , 1 ]
	}
	)
}

## Make i to j response, outcome 2
Y2 <- rep (NA, N) 
for ( i in 1:N ) {
Y2[i] <- with ( d ,
	{ a2 +
		v_ij_ID [i_ID[i] , 2 ] +
		v_ij_ID [j_ID[i] , 4 ] +
		v_e [ ij_dyad_ID[i] , 2 ]
	}
	)
}

## Make j to i response, outcome 1
Y3 <- rep (NA, N)
for ( i in 1:N ) {
Y3[i] <- with ( d ,
	{ a1 +
		v_ij_ID [i_ID[i] , 3 ] +
		v_ij_ID [j_ID[i] , 1 ] +
		v_e [ ij_dyad_ID[i] , 3 ]
	}
	)
}

## Make j to i response, outcome 2
Y4 <- rep (NA, N)
for ( i in 1:N ) {
Y4[i] <- with ( d ,
	{ a2 +
		v_ij_ID [i_ID[i] , 4 ] +
		v_ij_ID [j_ID[i] , 2 ] +
		v_e [ ij_dyad_ID[i] , 4 ]
	}
	)
}

d$Y1 <- Y1
d$Y2 <- Y2
d$Y3 <- Y3
d$Y4 <- Y4

dat_list <- list (
 N = N,
 N_ij_ID = N_ij_ID,
 Y1 = d$Y1,
 Y2 = d$Y2,
 Y3 = d$Y3,
 Y4 = d$Y4,
 nresp = 4,
 nrescor = 6,
 i_ID = d$i_ID,
 j_ID = d$j_ID,
 ij_dyad_ID = d$ij_dyad_ID
)

And then the model code is:

model_code <- "
data { 
  int<lower=1> N;  // total number of observations 
  real Y1[N];  // response variable 
  real Y2[N];  // response variable 
  real Y3[N];  // response variable 
  real Y4[N];  // response variable 
  int<lower=1> nresp;  // number of responses
  int nrescor;  // number of residual correlations
  
  // data for informant-level effects
  int<lower=1> i_ID[N]; //# Random effect ID's
  int<lower=1> j_ID[N]; //# Random effect ID's
  int<lower=1> N_ij_ID; //#Number of ID's, should be 285 

} 

transformed data { 
  vector[nresp] Y[N];  // response matrix
  for (n in 1:N) {
    Y[n] = [Y1[n], Y2[n], Y3[n], Y4[n]]';
  }
} 
parameters { 
  real response_1_Intercept;  
  real response_2_Intercept;   
  
  real<lower=0> sigma_response_1;  // residual SD for response 1
  real<lower=0> sigma_response_2;  // residual SD for response 2
  
  // Participant level covariance matrix
  vector<lower=0>[4] sigma_ij_ID;  // participant-level standard deviations
  matrix[4, N_ij_ID] z_ij_ID;  // unscaled group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[4] L_Rho_ij_ID;
  
  
  ////////////////
  
  // Outcome level matrix of residuals
  real<lower=-1,upper=1> rho_c12;
  real<lower=-1,upper=1> rho_r1;
  real<lower=-1,upper=1> rho_r2;
  real<lower=-1,upper=1> rho_r12;

} 


transformed parameters { 
  // participant-level effects 
  matrix[N_ij_ID, 4] v_ij_ID;
  
  // residual-level effects and covariances
  	matrix [4,4] RHO;
	vector [4] scales;
	matrix [4,4] Lrescor;
  
  
  // Transform participant level random effects
  v_ij_ID = (diag_pre_multiply (sigma_ij_ID, L_Rho_ij_ID) * z_ij_ID)';


 // Transform residuals from residual matrix
    RHO[1,1] = 1;
    RHO[1,2] = rho_c12;
    RHO[1,3] = rho_r1;
    RHO[1,4] = rho_r12;
    RHO[2,2] = 1;
    RHO[2,3] = rho_r12;
    RHO[2,4] = rho_r2;
    RHO[3,3] = RHO[1,1];
    RHO[3,4] = RHO[1,2];
    RHO[4,4] = RHO[2,2];
    for ( i in 2:4 )
        for ( j in 1:(i-1) ) {
            RHO[i,j] = RHO[j,i];
        }

    scales[1] = sigma_response_1*sigma_response_1;
    scales[2] = sigma_response_2*sigma_response_2;
    scales[3] = scales[1];
    scales[4] = scales[2];
    Lrescor = cholesky_decompose(RHO);

} 

model { 
  vector[N] mu_Y1 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y2 = rep_vector(0, N) + response_2_Intercept; 
  vector[N] mu_Y3 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y4 = rep_vector(0, N) + response_2_Intercept; 
  
  // multivariate linear predictor matrix 
  vector[nresp] Mu[N]; 
  vector[nresp] sigma = [sigma_response_1, sigma_response_2, sigma_response_1, sigma_response_2]';
  
  // cholesky factor of residual covariance matrix 
  matrix[nresp, nresp] LSigma = diag_pre_multiply(sigma, Lrescor); 
  for (n in 1:N) { 
    mu_Y1[n] += v_ij_ID [i_ID[n] , 1 ] + v_ij_ID [j_ID[n] , 3 ];
    mu_Y2[n] += v_ij_ID [i_ID[n] , 2 ] + v_ij_ID [j_ID[n] , 4 ];
    mu_Y3[n] += v_ij_ID [i_ID[n] , 3 ] + v_ij_ID [j_ID[n] , 1 ];
    mu_Y4[n] += v_ij_ID [i_ID[n] , 4 ] + v_ij_ID [j_ID[n] , 2 ];
    Mu[n] = [mu_Y1[n], mu_Y2[n], mu_Y3[n], mu_Y4[n]]';
  } 
  
  // priors including all constants 
  target += student_t_lpdf(response_1_Intercept | 3, 5, 10); 
  target += student_t_lpdf(sigma_response_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(response_2_Intercept | 3, 6, 10); 
  target += student_t_lpdf(sigma_response_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 

  target += lkj_corr_cholesky_lpdf(Lrescor | 1); 
  target += student_t_lpdf (sigma_ij_ID | 3, 0, 10)
    - 4 * student_t_lccdf (0 | 3, 0, 10); 
  target += normal_lpdf (to_vector(z_ij_ID) | 0, 1);
  target += lkj_corr_cholesky_lpdf (L_Rho_ij_ID | 1); 

  // likelihood including all constants 
  { 
    target += multi_normal_cholesky_lpdf(Y | Mu, LSigma); 
  } 
} 
"

#6

If I read it correctly, you have N observations and each observation has 4 response variables. Then you could do leave-one-out with N things, with generated quantities code something like (I didn’t try to run this)

for (n in 1:N)
  log_lik[n] = multi_normal_cholesky_lpdf(Y[n] | Mu[n], LSigma);

#7

That seems promising, Aki. However, when I try the following in generated quantities:

generated quantities { 

  vector[N] log_lik;
  
  for (n in 1:N)
  log_lik[n] = multi_normal_cholesky_lpdf(Y[n] | Mu[n], LSigma);
  
} 

It returns an error message that Mu does not exist.

Do I need to (re)define Mu somewhere? Since Mu is a vector, do I need to expand the loop?

Thanks.


#8

Yes, you need to define Mu_n again inside that loop.


#9

Thanks, Ben. I’m trying different syntactical variations to define Mu in the loop (trying versions in which I also define mu_YI, mu_Y2, etc.). So far there have been different error messages as a result.

Is there an example that I could follow?

Thanks.


#10

I am trying to extend on what Ben wrote: Variables that were defined in the model block are not available in the generates quantities block. Therefore you have to reapeat all operations you performed in there model block to ontain Mu in the generated quantities block.


#11

Cool. Thanks, Guido.

After a couple of tweaks, the following code evidently worked for generating log_lik. Does that seem about right to you?

generated quantities { 

  vector[N] log_lik;
  
  for (n in 1:N) {
  
  vector[N] mu_Y1 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y2 = rep_vector(0, N) + response_2_Intercept; 
  vector[N] mu_Y3 = rep_vector(0, N) + response_1_Intercept; 
  vector[N] mu_Y4 = rep_vector(0, N) + response_2_Intercept; 
  
  // multivariate linear predictor matrix 
  vector[nresp] Mu[N]; 
  vector[nresp] sigma = [sigma_response_1, sigma_response_2, sigma_response_1, sigma_response_2]';
  
  // cholesky factor of residual covariance matrix 
  matrix[nresp, nresp] LSigma = diag_pre_multiply(sigma, Lrescor); 
  for (i in 1:N) { 
    mu_Y1[i] += v_ij_ID [i_ID[i] , 1 ] + v_ij_ID [j_ID[i] , 3 ];
    mu_Y2[i] += v_ij_ID [i_ID[i] , 2 ] + v_ij_ID [j_ID[i] , 4 ];
    mu_Y3[i] += v_ij_ID [i_ID[i] , 3 ] + v_ij_ID [j_ID[i] , 1 ];
    mu_Y4[i] += v_ij_ID [i_ID[i] , 4 ] + v_ij_ID [j_ID[i] , 2 ];
    Mu[i] = [mu_Y1[i], mu_Y2[i], mu_Y3[i], mu_Y4[i]]';
  } 
  
  log_lik[n] = multi_normal_cholesky_lpdf(Y[n] | Mu[n], LSigma);
  
  }
} 

#12

Yes. Did you get sensible results?


#13

Yes. Thanks, Aki. I haven’t done a full range of simulations with covariates, but so far it seems sensible.