Model converges but high k pareto values

Alfonso · October 4, 2022, 8:44pm

Hi,

Recently, I fitted a multivariate autoregressive model to times series data. The effective sample size was big enough, R-hat values were below 1.01, no divergent transitions. However, all the k pareto values were above 1 when using loo. I decided to check a simpler version of the model with simulated data. So here is an univariate simpler version of the model, which is a Poisson-Lognormal autoregressive model.

data {
  int<lower=0> N;
  int y[N];
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  vector[N] log_lambda;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ exponential(1);
  
  log_lambda[1] ~ normal(0,5);
  
  for(n in 2:N){
    log_lambda[n] ~ normal(alpha + beta * log_lambda[n-1], sigma);
  }
  
  for(n in 1:N){
    target += poisson_log_lpmf(y[n] | log_lambda[n]);
  }
  
}
generated quantities{
  vector[N] log_lik;
  vector[N] log_lambda_pred;
  int y_pred[N];
  
  log_lambda_pred[1] = log_lambda[1];
  for(t in 2:N){
    log_lambda_pred[t] = normal_rng(alpha + beta*log_lambda[t-1], sigma);
  }
  
  for(n in 1:N){
    log_lik[n] = poisson_log_lpmf(y[n] | log_lambda[n]);
      if(log_lambda_pred[n]>20){
        y_pred[n] = poisson_log_rng(20);
    }else{
        y_pred[n] = poisson_log_rng(log_lambda_pred[n]);
      }
  }
}

And the R code to simulate data, fit the model and use loo.

rm(list = ls())
library(rstan)
library(loo)

# simulate time series
Gompertz.sim <- function(init.pop, alpha, beta, time, sd){
  community <- log_lambda <- rep(NA, time)
  log_lambda[1] <- init.pop 
  for (t in 2:time) {
    log_lambda[t] <- alpha + beta*log_lambda[t-1] + rnorm(1, 0, sd)
  }
  for (i in 1:time) {
    community[i] <- rpois(1, lambda = exp(log_lambda[i]))
  }
  return(results = list(counts = community, log_lambda = log_lambda))
}

set.seed(1234)
P0<- 2
sp <- 1
t <- 100
alpha <- rnorm(1, 0.25, 0.01)
beta <- rnorm(1, 0.92, 0.001)
sd_noise <- rnorm(1, 0.25, 0.01)
Sim.data <- Gompertz.sim(init.pop = P0, alpha = alpha, beta = beta, time = t, sd = sd_noise)


dat<-list(y = Sim.data$counts, N = length(Sim.data$counts))


PL_AR_model <- stan(file = "PLNAR1_1sp.stan",
                    data = dat,
                    chains = 4, 
                    cores = 4,
                    iter = 4000,
                    control = list(adapt_delta = 0.8, max_treedepth = 10))


log_lik_PL <- extract_log_lik(PL_AR_model, merge_chains = F)
r_eff_PL <- relative_eff(exp(log_lik_PL), cores = 4)
loo_PL <- loo(log_lik_PL, r_eff = r_eff_PL, cores = 4, save_psis = T)

The model converges and captures the parameters used in the simulation. However, even though I simulated the data from the same generating process as the model in Stan, the k pareto values are high

Computed from 8000 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -270.9  8.0
p_loo        34.1  3.2
looic       541.9 16.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     52    52.0%   902       
 (0.5, 0.7]   (ok)       36    36.0%   218       
   (0.7, 1]   (bad)      11    11.0%   28        
   (1, Inf)   (very bad)  1     1.0%   9         
See help('pareto-k-diagnostic') for details.

Also, the posterior predictive check seems alright

Rplot02

From this thread, I think p_loo << p but not sure about p. I think p is the 3 parameters (alpha, beta and sigma) plus the 100 log_lambda, right? If I am correct, the model is likely to be misspecified, but the Stan model is the same to the model simulating the data (unless I made some conceptual or coding mistake along the way).

Any idea why this is happening?

avehtari · October 5, 2022, 9:32am

Hi,

You have one parameter for each observation. When one observation is removed, the posterior for the corresponding parameter changes a lot and importance sampling is not able to handle this. This can happen also when the model is correctly specified. See a similar example in Roaches cross-validation demo. In this case you can use quadrature integration to integrate over that one parameter as shown in the integrated LOO section of the same demo.

btw. the information from that thread and more has been collected to CV-FAQ and loo package glossary.

Not related to your main question, but it’s not the model that converges, but MCMC (or at least the diagnostics don’t indicate MCMC convergence problems, which is not yet a guarantee of sufficient convergence).

Alfonso · October 5, 2022, 11:52pm

Thanks for the quick reply @avehtari .

First time doing this so I am a bit confused with some of the syntax. Reading here and here about integrate_1d and from the Roaches example, I see that the parameters are defined as theta and real z is the independent variable being integrated over, right? In my case, I should integrate over log_lambda. However, as the model is autoregressive, the data values should also be log_lambda? Thus the code may look like this for the functions block, but not too sure about it

functions{
real integrand(real log_lambda, real notused, array[] real theta,
              array[] real log_lambda, array[] int yi) {
        real alpha = theta[1] ;
        real beta = theta [2] ;
        real sigma= theta [3] ;
        int ntheta = num_elements(theta);
        vector[ntheta-3]  log_lambda_i = to_vector(theta[4:ntheta]);
        return exp(normal_lpdf(log_lambda|alpha+beta*log_lambda_i) + poisson_log_lpmf(yi|log_lambda));
 }
}

and something like this for the log_lik in the generated quantities block

generated quantities{
  vector[N] log_lik;
  ...
  for(n in 1:N) {
   log_lik[n] = log(integrate_1d(integrand,
                            negative_infinity(),
                            positive_infinity(),
                            {alpha, beta,sigma}, log_lambda[n-1], {y[n]})
}

}

Does this make sense?

Thanks for the correction :)

avehtari · October 6, 2022, 5:25pm

Data are y[n]. Should be integrating over lambda[n], but I have not tested this with latent AR model, and I don’t have time right now to think this carefully so I may be missing some complication in the case of AR. Hmm, or did you mean that log_lambda[n-1] needs to be one of the arguments for the integration as it is defining the normal distribution to integrate over? That is what needs to be there.

Does it work? :)

Alfonso · October 19, 2022, 5:14pm

I tried running the stan code but I encounter this error

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model62945d1d5cf0_PLNAR1_1sp_v2' at line 3, column 29
  -------------------------------------------------
     1: functions{
     2:   real integrand(real log_lamb, 
     3:                  real notused, 
                                    ^
     4:                  array[] real theta,
  -------------------------------------------------

PARSER EXPECTED: <argument declaration or close paren ) to end argument declarations>
Error in stanc(filename, allow_undefined = TRUE) : 
  failed to parse Stan model 'PLNAR1_1sp_v2' due to the above error.

I tried running the code from the roaches example but I get the same error. I am using rstan instead of cmdstanr and I am working with version 2.21.7

Here is the Stan code so far

functions{
  real integrand(real log_lamb, 
                 real notused, 
                 array[] real theta,
                 array[] real log_lambda, 
                 array[] int yi) {
        real alpha = theta[1] ;
        real beta = theta [2] ;
        real sigma= theta [3] ;
        int ntheta = num_elements(theta);
        vector[ntheta-3]  log_lambda_i = to_vector(theta[4:ntheta]);
        return exp(normal_lpdf(log_lambda|alpha+beta*log_lambda_i) + poisson_log_lpmf(yi|log_lambda));
}
data {
  int<lower=0> N;
  int y[N];
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  vector[N] log_lambda;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ exponential(1);
  
  //Latent true expected abundance
  log_lambda[1] ~ normal(0,5);
  
  for(n in 2:N){
    log_lambda[n] ~ normal(alpha + beta * log_lambda[n-1], sigma);
  }
  
  for(n in 1:N){
    target += poisson_log_lpmf(y[n] | log_lambda[n]);
  }
  
}
generated quantities{
  vector[N] log_lik;
  vector[N] log_lambda_pred;
  int y_pred[N];
  
  log_lambda_pred[1] = log_lambda[1];
  for(t in 2:N){
    log_lambda_pred[t] = normal_rng(alpha + beta*log_lambda[t-1], sigma);
  }
  
  for(n in 1:N){
    //log_lik[n] = poisson_log_lpmf(y[n] | log_lambda[n]);
      log_lik[n] = log(integrate_1d(integrand,
                            negative_infinity(),
                            positive_infinity(),
                            {alpha, beta,sigma}, log_lambda[n-1], {y[n]})
      if(log_lambda_pred[n]>20){
        y_pred[n] = poisson_log_rng(20);
    }else{
        y_pred[n] = poisson_log_rng(log_lambda_pred[n]);
      }
  }
}

Alfonso · October 19, 2022, 10:48pm

The model seems to run when I use cmdstanr but I got the next warning when sampling from the posterior

Chain 2 Exception: integrate: error estimate of integral 3.58554e-13 exceeds the given relative tolerance times norm of integral

And I get NAs when extracting log_lik

> loo(fitpri$draws("log_lik"), r_eff=relative_eff(fitpri$draws("log_lik")))

Error in validate_ll(log_ratios) : NAs not allowed in input.

Here is the R code

library(cmdstanr)
library(rstanarm)
library(loo)

data("roaches")
roaches$sqrt_roach1 <- sqrt(roaches$roach1)

modpri <- cmdstan_model(stan_file = "int_loo.stan")
datap <- list(N=dim(roaches)[1], P=3,
              offsett=log(roaches$exposure2),
              X=roaches[,c('sqrt_roach1','treatment','senior')],
              y=roaches$y)

fitpri <- modpri$sample(data = datap, refresh = 0, chains = 4, parallel_chains = 4)

mcmc_areas(as_draws_matrix(fitpri$draws(variables=c('alpha','beta','sigmaz'))), prob_outer = .999)
loocmd <- function(fit) {
  loo(fit$draws("log_lik"), r_eff=relative_eff(fit$draws("log_lik")))
}

(loopri <- loocmd(fitpri))

and the stan code (as in the roaches example)

// Poisson regression with hierarchical intercept ("random effect")
functions {
  real integrand(real z, real notused, array[] real theta,
               array[] real Xi, array[] int yi) {
    real sigmaz = theta[1];
    real offsetti_plus_alpha = theta[2];
    int ntheta = num_elements(theta);
    vector[ntheta-2] beta = to_vector(theta[3:ntheta]);
    return exp(normal_lpdf(z|0,sigmaz)+poisson_log_glm_lpmf(yi | to_row_vector(Xi), z+offsetti_plus_alpha, beta));
  }
}
data {
  int<lower=0> N;           // number of data points
  int<lower=0> P;           // number of covariates
  matrix[N,P] X;            // covariates
  array[N] int<lower=0> y;  // target
  vector[N] offsett;        // offset (offset variable name is reserved)
}
parameters {
  real alpha;               // intercept
  vector[P] beta;           // slope
  vector[N] z;              // individual intercept ("random effect")
  real<lower=0> sigmaz;     // prior scale for z
}
model {
  // priors
  alpha ~ normal(0, 3);  
  beta ~ normal(0, 3);    
  z ~ normal(0, sigmaz);  
  sigmaz ~ normal(0, 1);
  // observation model
  y ~ poisson_log_glm(X, z+offsett+alpha, beta); 
}
generated quantities {
  // log_lik for PSIS-LOO
  vector[N] log_lik;
  for (i in 1:N) {
    // z as posterior draws, this would be challenging for PSIS-LOO (and WAIC)
    // log_lik[i] = poisson_log_glm_lpmf({y[i]} | X[i,], z[i]+offsett[i]+alpha, beta);
    // we can integrate each z[i] iut with 1D adaptive quadrature 
    log_lik[i] = log(integrate_1d(integrand,
                              negative_infinity(),
                              positive_infinity(),
                              append_array({sigmaz}, append_array({offsett[i]+alpha}, to_array_1d(beta))),
                  to_array_1d(X[i,]),
                  {y[i]}));
  }
}

avehtari · October 25, 2022, 1:21pm

Great

You could try replacing the negative_infinity() and positive_infinity() with c*sigmaz and -c*sigmaz, with c something like 8 (bigger c gives more accurate integral, but increases the probability of under/overflows)

Alfonso · October 31, 2022, 2:49pm

The roaches model works when I specify a re_tol and loo seems fine

Computed from 8000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -878.3 38.2
p_loo         4.8  0.5
looic      1756.7 76.4
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

But I get the next warning messages

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'C:/Users/alfon/AppData/Local/Temp/Rtmpo51Yla/model-9350324c1206.stan', line 29, column 2 to column 24)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4

I have now changed the integral part in my model so the integrad part is

functions{
  real integrand(real log_lambda, 
                 real notused, 
                 array[] real theta,
                 array[] real log_lambda, 
                 array[] int yi) {
        real alpha = theta[1] ;
        real beta = theta [2] ;
        real sigma= theta [3] ;
        real log_lambda_i= theta [4] ;
        return exp(normal_lpdf(log_lambda|alpha+beta*log_lambda_i, sigma) + poisson_log_lpmf(yi|log_lambda));
  }
}
...

To account for the AR structure, I included the parameter log_lambda_i, which corresponds to log_lambda[n-1], and I integrate over log_lambda, which is log_lambda[n]. These are called in integrate_1d in the generated quantities block

generated quantities{
  ...
  for(n in 1:N){
    //log_lik[n] = poisson_log_lpmf(y[n] | log_lambda[n]);
      log_lik[n] = log(integrate_1d(integrand,
                            negative_infinity(),
                            positive_infinity(),
                            {alpha, beta,sigma, log_lambda[n-1]}, log_lambda[n], 
                            {y[n]},0.00001));
...
}

However, I get the next error when I try to compile the model

Semantic error in 'C:/Users/alfon/AppData/Local/Temp/RtmpSkbmZd/model-cacc43283108.stan', line 2, column 2 to line 12, column 3:
   -------------------------------------------------
     1:  functions{
     2:    real integrand(real log_lambda, 
           ^
     3:                   real notused, 
     4:                   array[] real theta,
   -------------------------------------------------

All function arguments must have distinct identifiers.

mingw32-make.exe: *** [make/program:50: C:\Users\alfon\AppData\Local\Temp\RtmpSkbmZd\model-cacc43283108.hpp] Error 1

Error: An error occured during compilation! See the message above for more information.

The whole stan model and the corresponding R code
PLNAR1_1sp_v2.stan (1.7 KB)
sim_1sp.R (990 Bytes)

avehtari · October 31, 2022, 5:24pm

In your code, you have two arguments named log_lambda. Change one of the to have a different name

Alfonso · October 31, 2022, 8:20pm

It worked!! The model compiled and the loo results are better

Computed from 8000 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -290.7 10.2
p_loo        17.3  2.5
looic       581.4 20.3
------
Monte Carlo SE of elpd_loo is 0.1.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     96    96.0%   1091      
 (0.5, 0.7]   (ok)        4     4.0%   272       
   (0.7, 1]   (bad)       0     0.0%   <NA>      
   (1, Inf)   (very bad)  0     0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

However, I get this warning

Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'C:/Users/alfon/AppData/Local/Temp/RtmpcTRCeA/model-5e3474663ead.stan', line 33, column 4 to column 66)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1

Also, I was wondering what the relative tolerance level should be?

And, why does this approach fail when using rstan instead of cmdstanr?

PL_AR_model <- stan(file = "PLNAR1_1sp_v2.stan",
                    data = dat,
                    chains = 4, 
                    cores = 4,
                    iter = 4000,
                    control = list(adapt_delta = 0.8, max_treedepth = 10))

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model5e3464f95c57_PLNAR1_1sp_v2' at line 3, column 29
  -------------------------------------------------
     1: functions{
     2:   real integrand(real log_lambda, 
     3:                  real notused, 
                                    ^
     4:                  array[] real theta,
  -------------------------------------------------

PARSER EXPECTED: <argument declaration or close paren ) to end argument declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'PLNAR1_1sp_v2' due to the above error.

avehtari · November 1, 2022, 7:46am

Great!

If you get it only in the beginning no need to worry. See Runtime warnings and convergence problems

You can only find out by testing different tolerances. The good thing is that you don’t need to rerun sampling, just rerun the generated quantities, see Run Stan's standalone generated quantities method — model-method-generate-quantities • cmdstanr

If you are using rstan from CRAN it has older stanc compiler. If you prefer rstan over cmdstanr, you can install a more recent rstan from Repository for distributing (some) stan-dev R packages | r-packages

Alfonso · November 2, 2022, 8:15pm

@avehtari thanks a lot for your help.

In case anyone is interested, here is the Stan code with the corrections

functions{
  real integrand(real log_lambda, 
                 real notused, 
                 array[] real theta,
                 array[] real log_lambda_i, 
                 array[] int yi) {
        real alpha = theta[1] ;
        real beta = theta [2] ;
        real sigma= theta [3] ;
        return exp(normal_lpdf(log_lambda|alpha+beta*to_row_vector(log_lambda_i), sigma) + poisson_log_lpmf(yi|log_lambda));
  }
}
data {
  int<lower=0> N;
  array[N] int y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  vector[N] log_lambda;
}
model {
  alpha ~ normal(0, 0.5);
  beta ~ normal(0, 0.5);
  sigma ~ exponential(5);
  
  //Latent true expected abundance
  log_lambda[1] ~ normal(0,5);
  
  for(n in 2:N){
    log_lambda[n] ~ normal(alpha + beta * log_lambda[n-1], sigma);
  }
  
  for(n in 1:N){
    target += poisson_log_lpmf(y[n] | log_lambda[n]);
  }
  
}
generated quantities{
  vector[N] log_lik;
  vector[N] log_lambda_pred;
  array[N] int y_pred;
  
  log_lambda_pred[1] = log_lambda[1];
  for(t in 2:N){
    log_lambda_pred[t] = normal_rng(alpha + beta*log_lambda[t-1], sigma);
  }
  
  log_lik[1] = poisson_log_lpmf(y[1]|log_lambda[1]);
  y_pred[1] =  poisson_log_rng(log_lambda_pred[1]);
  for(n in 2:N){
    //log_lik[n] = poisson_log_lpmf(y[n] | log_lambda[n]);
      log_lik[n] = log(integrate_1d(integrand,
                            negative_infinity(),
                            positive_infinity(),
                            {alpha,beta,sigma}, {log_lambda[n-1]}, 
                            {y[n]},1e-8));
      if(log_lambda_pred[n]>20){
        y_pred[n] = poisson_log_rng(20);
    }else{
        y_pred[n] = poisson_log_rng(log_lambda_pred[n]);
      }
  }
}

Topic		Replies	Views
High Pareto-k values for an IRT model with simulated datasets Modeling loo	5	63	April 2, 2025
How can I solve the problem of high Pareto-k values? Modeling	3	1094	June 26, 2023
Pareto k warns of misspecified model on model recovery when computing PSIS-LOO-CV Modeling loo	18	897	August 10, 2022
Help with Poisson model Modeling loo	45	7609	March 3, 2018
Recommendations for what to do when k exceeds 0.5 in the loo package? Modeling loo	21	7510	March 8, 2018

Model converges but high k pareto values

Related topics