Integrated loo and multivariate autoregressive model

Alfonso · November 3, 2022, 11:19pm

Hi,

A bit of context first. Recently, I fitted an autoregressive model to simulated data. The MCMC converged (R-hats below 1.01, no divergent transitions, etc.). However, all the k pareto values where above 0.7 when using loo. The reason for this was:

With help, I used adaptive quadrature in the univariate version of the model. Specifically, I integrated over the parameter log_lambda, which is the log latent abundance for one species at each time.

Now, I want to expand it to a multivariate model. In this case, log_lambda is a vector of latent abundances at any time step.

This is a simple version of the multivariate model

data {
  int<lower=1> S;  // Number of Species
  int<lower=1> T;   // Number of years
  int Y[T,S];   // The outcome matrix
}
parameters {
  vector[S] alpha;   // Vector of intrinsic growth rates
  matrix[S,S] beta;   // matrix with competitive parameter
  vector[S] log_lambda[T];  // latent parameters
  cholesky_factor_corr[S] L_Rho;
  vector<lower=0>[S] sde;
}
transformed parameters{
  matrix[S,S] L_S;
  L_S = diag_pre_multiply(sde, L_Rho);
}
model {
// the priors 
  alpha ~ normal(0,1);
  to_vector(beta) ~ normal(0,1);
  L_Rho ~ lkj_corr_cholesky(2);
  sde ~ exponential(1);
  
//Latent true expected abundance
  log_lambda[1] ~ normal(0,5);
  
  for(t in 2:T){
    log_lambda[t]~multi_normal_cholesky(alpha + 
                                        beta*
                                        log_lambda[t-1], 
                                        L_S);
  }
      

//The likelihood
  for(s in 1:S){
      target += poisson_log_lpmf(Y[,s] | log_lambda[1:T,s]);
  }
}
generated quantities{
  matrix[S, S]Rho;
  matrix[S, S] Sigma;
  vector[T] log_lik;
  vector[S] log_lambda_pred[T];
  int<lower=0> Y_pred[T,S];

  Rho = multiply_lower_tri_self_transpose(L_Rho);
  Sigma = L_S*L_S';
  
  log_lambda_pred[1] = log_lambda[1];
    for(n in 2:T){
      log_lambda_pred[n] = multi_normal_cholesky_rng(alpha + beta*
                                            log_lambda[n-1], 
                                            L_S);
  }
  
  for(s in 1:S){
    for(n in 1:T){
      if(log_lambda_pred[n,s]>20){
        Y_pred[n,s] = poisson_log_rng(20);
      }else{
        Y_pred[n,s] = poisson_log_rng(log_lambda_pred[n,s]);
      }
      log_lik[n] = poisson_log_lpmf(Y[n] | log_lambda[n]);
    }
  }
}

However, I am not sure how to integrate over log_lambda. I guess I cannot use integrate_1d as it integrates over a 1 dimensional parameter, right? However, I was wondering if there is a function similar to integrate_1d for a multivariate case. If PSIS-LOO and integrated LOO does not work for this case, what could I use for model selection and predictive accuracy? Bayes factor?

This is what I have attempted so far with integrate_1d

functions{
  real integrand(real log_lambda, 
                 real notused, 
                 array[] real theta,
                 array[] real log_lambda_i, 
                 array[] int yi) {
        real alpha = theta[1] ;
        real beta = theta[2] ;
        real S = theta[3];
        matrix[S,S] L_S = theta[4];
        return exp(multi_normal_cholesky_lpdf(log_lambda|alpha+beta*to_row_vector(log_lambda_i), L_S) + poisson_log_lpmf(yi|log_lambda));
  }
}

....

generated quantities{
 ....
  log_lik[1] = poisson_log_lpmf(Y[1] | log_lambda[1]);
  for(n in 2:T){
    log_lik[n] = log(integrate_1d(integrand,
                          negative_infinity(),
                          positive_infinity(),
                          {alpha,beta,S,L_S}, {log_lambda[n-1]}, 
                          {y[n]},1e-8));
  }
....
}

With the corresponding error

Semantic error in 'C:/Users/alfon/AppData/Local/Temp/Rtmp8cWFG6/model-76801f4252b.stan', line 10, column 15 to column 16:
   -------------------------------------------------
     8:          real beta = theta[2] ;
     9:          real S = theta[3];
    10:          matrix[S,S] L_S = theta[4];
                        ^
    11:          return exp(multi_normal_cholesky_lpdf(log_lambda|alpha+beta*to_row_vector(log_lambda_i), L_S) + poisson_log_lpmf(yi|log_lambda));
    12:    }
   -------------------------------------------------

Matrix row size must be of type int. Instead found type real.

mingw32-make.exe: *** [make/program:50: C:\Users\alfon\AppData\Local\Temp\Rtmp8cWFG6\model-76801f4252b.hpp] Error 1

Error: An error occured during compilation! See the message above for more information.

avehtari · November 5, 2022, 5:37pm

Correct.

Unofrtunately, no. One person did use nested call to integrate_1d to integrate over two parameters. The nested call for more than two parameters is likely to be very slow (we would not need MCMC if nested adaptive quadrature would be fast for higher dimensional distributions).

If PSIS in PSIS-LOO fails due to the leave-one-out posteriors being very different than the full posterior then estimation of Bayes factor is even more difficult (e.g. commonly used bridge sampling is likely to fail, but the package doesn’t have diagnostics to indicate that).

Options

You could leave out just one observation per year, to get back to need to integrate only over one parameter
You could do brute force cross-validation by doing the refits
You could try implicitly adaptive IS with moment matching, see Avoiding model refits in leave-one-out cross-validation with moment matching • loo
You could try mixture-IS approach, see [2209.09190] Robust leave-one-out cross-validation for high-dimensional Bayesian models

Topic		Replies	Views
Model converges but high k pareto values Modeling loo	11	972	November 2, 2022
Log likelihood and loo for multinomial regression Modeling loo	10	1185	June 30, 2018
How to solve pareto k warnings for a multivariate normal model? Modeling loo	3	684	July 25, 2018
LOO for Multivariate Probit Modeling loo	5	1024	February 21, 2022
Calculating pareto-k-diagnostic for datasets that doesn't fit into RAM Modeling	2	258	January 30, 2024

Integrated loo and multivariate autoregressive model

Related topics