Information criteria for multivariate models

Hi!

I am actually exploring multivariate latent variable models, and I am wondering how to compute information criteria (loo, waic) for such models. I have an hard time to find references dealing with such questions, except this one, which presents criteria I never heard of.

Specifically, I can’t figure out how to write the likelihood computation in the generated quantity block, and the type and dimensions of output (array of matrices?).

I know that brms can compute such criteria, but the likelihood computation is made after model fitting, so I can’t use it to produce an exemple of correct generated quantity block.

Any help?

Thank you!
Lucas

Hi Lucas,

There exist many multivariate latent variable models, could you be more specific? Also if you would show Stan code for your model, it would be easier to say how to write the likelihood computation in the generated quantity block.

Hi!

Here is the stan code for a model including covariates. We already discussed these kinds of model (close to bayesian factor analysis) here.

data{
  int N; // sample size
  int P; // number of species
  int K; //Number of predictors
  int D; // number of dimensions
  matrix[N,K] X; //Predictor matrix
  int<lower=0> Y[N,P]; // data matrix of order [N,P]
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(P-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[N] alpha; //Row intercepts
  row_vector[P] b0; // Intercept per species
  matrix[K,P] b; //Coefficient per species
  vector[M] L_lower; //Lower diagonal loadings
  vector<lower=0>[D] L_diag; //Positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
  //Hyperparameters
  real<lower=0> sigma_a; //Variance of the row intercepts
  real<lower=0> sigma_b0; //Variance of the species intercepts
  vector<lower=0>[K] sigma_b; //Variance of the species slopes
  //real mu_lower; //Mean of lower-diag loadings
  //real<lower=0> sigma_lower; //Sd of lower-diag loadings
  real<lower=0> eta; //Parameter of LKJ prior for Rho
}
transformed parameters{
  cholesky_factor_cov[P,D] lambda; //Final matrix of laodings
  matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
  matrix[N,P] temp; //intermediate predictor
  matrix<lower=0>[N,P] mu; // predicted values
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho);
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ lambda[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) lambda[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):P) {
  	     idx2 = idx2+1;
  	     lambda[i,j] = L_lower[idx2];
  	   }
  	 }
  }
  
  // Predictor
  temp = FS * lambda' + X * b;
  for(n in 1:N) mu[n] = exp(alpha[n] + b0 + temp[n]);
  
}
model{
  // Hyperpriors
  sigma_a ~ gamma(2,0.1); //Row intercepts hyperprior
  sigma_b0 ~ gamma(2,0.1); //Species intercept effects hyperprior
  for(k in 1:K) sigma_b[k] ~ gamma(2,0.1); //Species slopes hyperprior
  //mu_lower ~ normal(0,0.1);//Mean of lower loadings
  //sigma_lower ~ gamma(2,0.1); //Variance of lower loadings
  eta ~ gamma(2,0.1); //Parameter of the cholesky prior for FS correlation structure
  
  // Priors
  alpha ~ normal(0, sigma_a); //Regularizing prior for row intercepts
  b0 ~ normal(0, sigma_b0); //Regularizing prior for species intercepts
  for(k in 1:K) b[k,] ~ normal(0, sigma_b[k]); //Regularizing prior for species slopes
  //L_diag ~ normal(0,1); //Prior for diagonal elements
  for(d in 1:D) L_diag[d] ~ chi_square(N-d);
  L_lower ~ normal(0,1); //Prior for lower loadings
  Rho ~ lkj_corr_cholesky(eta); //Regularizing prior for Rho
  
  for(i in 1:N){	
    Y[i,] ~ poisson(mu[i]);
    FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  matrix[P,P] cov_lambda;
  
  cov_lambda = multiply_lower_tri_self_transpose(lambda);
}

Thanks for your help!

Each element in Y[i,] is iid from poisson(mu[i]), so loop over i and j and compute

log_lik[i,j] = poisson_lpmf(Y[i,j] | mu[i]);

after the loop you may also transform that to a vector with to_vector if you want to have it in the shape loo package likes.

1 Like

As simple as that! Thank you very much!

I wrote this generated quantity block :

generated quantities{
  matrix[P,P] cov_lambda;
  matrix[N,P] log_lik1;
  vector[N*P] log_lik;
  
  cov_lambda = multiply_lower_tri_self_transpose(lambda); //Create the covariance matrix
  
  //Compute the likelihood for each observation
  for(n in 1:N){
    for(p in 1:P){
      log_lik1[n,p] = poisson_lpmf(Y[n,p] | mu[n,p]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

but when I tried to compute loo :

covX_log_lik ← extract_log_lik(covX_stan, merge_chains = F)
covX_r_eff ← relative_eff(exp(covX_log_lik))
(covX_loo ← loo(covX_log_lik, r_eff = covX_r_eff))

I get this output :
Computed from 3000 by 336 log-likelihood matrix

       Estimate   SE
elpd_loo   -779.0 47.2
p_loo       284.8 27.0
looic      1558.0 94.3
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     124   36.9%   1         
 (0.5, 0.7]   (ok)        75   22.3%   0         
   (0.7, 1]   (bad)       90   26.8%   0         
   (1, Inf)   (very bad)  47   14.0%   0         
See help('pareto-k-diagnostic') for details.
There were 29 warnings (use warnings() to see them)
warnings()
Messages d'avis :
1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

2: In log(z) : production de NaN
3: In log(z) : production de NaN
4: In log(z) : production de NaN
...

When I did what you suggested, that is

generated quantities{
  matrix[P,P] cov_lambda;
  matrix[N,P] log_lik1;
  vector[N*P] log_lik;
  
  cov_lambda = multiply_lower_tri_self_transpose(lambda); //Create the covariance matrix
  
  //Compute the likelihood for each observation
  for(n in 1:N){
    for(p in 1:P){
      log_lik1[n,p] = poisson_lpmf(Y[n,p] | mu[n]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

The loo computation was impossible, and returned NA only.

Sorry, I didn’t notice that mu is a matrix since you had

Y[i,] ~ poisson(mu[i]);

instead of

Y[i,] ~ poisson(mu[i,]);

If Y has 336 observations, and mu has 336 unknowns then this is possible, and implies that your prior is weak (too flexible) or data is overdispersed compared to Poisson.

Oh, that is my mistake! Wow, I did not see it…