High pareto k diagnostic, overdispersed models and moment matching

bcd · January 20, 2023, 4:09pm

Hi there,

I am trying to fit a biphasic model of decay to some antibody titres using Rstan. The final model will account for measurement errors as discussed here 6.1 Bayesian measurement error model | Stan User’s Guide.

When I ran this model, all the diagnostics seemed okay (Rhat, divergent transitions, ESS etc,), however, all the pareto k values were >0.7 when using loo. I tried using an overdispersed model (t student distribution, rather than normal), however this increased the number of pareto k that were above 1.
I’ve shared a simpler model below, which just fits a biphasic model to a subset of my data, ignoring any measurement error.

data{
   int N; 
   int T ; 
   real time[T];
   real mu[N];                 // observed average titres
}


parameters {
  real<upper = 0> pi_1;      // decay rate 1
  real pi_2;                         // decay rate 2 - can be positive to allow for titre boosting 
  real<lower = 0> ts   ;       // time switch decay rate
  real<lower = 0> sigma;
}

transformed parameters {
  real neut[T] ;
  real n_N[N] ;
  
  // biphasic model 
   for(t in 1:T)
    neut[t] = mu[1] * (exp(pi_1 * time[t] + pi_2 * ts) + exp(pi_2 * time[t] + pi_1 * ts)) / (exp(pi_1 * ts) + exp(pi_2 * ts)) ;

// match titre time points to data 

   n_N[1] =    neut[1] ;
   n_N[2] =    neut[6] ;
   n_N[3] =    neut[12] ;
   n_N[4] =    neut[24] ;
   n_N[5] =    neut[36] ;
   n_N[6] =    neut[48] ;
}


model {

//   target += normal_lpdf(mu| n_N, sigma);
     target += student_t_lpdf(mu|(N-1), n_N, sigma);
     
// priors
  pi_1 ~ normal(0,0.1) ;
  pi_2 ~ normal(0,0.1) ;
  ts ~ normal(1,2) ;
  sigma ~ cauchy(0,5);
}

generated quantities{
 vector[N] log_lik ;
 // for(i in 1:N)  log_lik[i]     = normal_lpdf(mu[i]| n_N[i], sigma) ;
 for(i in 1:N)  log_lik[i] = student_t_lpdf(mu|(N-1), n_N, sigma) ;
}

The data used is:

stan_data = list(
  mu = c(2.264818, 1.944483, 1.886491, 1.863323, 1.913896, 1.869833), 
  SE = c(0.01921157, 0.02387729, 0.02568116, 0.03187213, 0.03567769, 0.03210992), 
  T = 48, 
  N = 6,
  time = 1:48)

If I use a normal likelihood, the loo output is:

Computed from 16000 by 6 log-likelihood matrix

         Estimate  SE
elpd_loo      2.7 0.9
p_loo         2.5 0.7
looic        -5.4 1.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     5     83.3%   1638      
 (0.5, 0.7]   (ok)       0      0.0%   <NA>      
   (0.7, 1]   (bad)      1     16.7%   60        
   (1, Inf)   (very bad) 0      0.0%   <NA>      
See help('pareto-k-diagnostic') for details

Whereas, if I use a student T likelihood, the loo output is:

Computed from 16000 by 6 log-likelihood matrix

         Estimate  SE
elpd_loo    -10.6 0.0
p_loo        41.6 0.0
looic        21.2 0.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     0       0.0%  <NA>      
 (0.5, 0.7]   (ok)       0       0.0%  <NA>      
   (0.7, 1]   (bad)      0       0.0%  <NA>      
   (1, Inf)   (very bad) 6     100.0%  2         
See help('pareto-k-diagnostic') for details.

I also tried using moment-matching using loo(fit, moment_matching = TRUE) - but this throws the error: Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: the model object is not created or not valid.

My questions are:

Why would using a student t distribution increase the number of influential observations?
Why does moment matching throw that error? I have found posts about moment matching not working, but they seem to be older and resolved.
If my model is simply misspecified, do you have any general suggestions about what the next steps should be?

I realise that I currently high ratio of parameters to observations, but in my full model I have 48 observations rather than 6.

Thank you in advance!

avehtari · January 23, 2023, 2:23pm

You have accidentally dropped [i] from indexing mu. Does it work better if you add that [i] there?

Topic		Replies	Views
High pareto K diagnostics when adding measurement error to a model Modeling loo	2	315	January 26, 2023
Model Comparison when high Pareto k? General rstan , loo	4	103	May 20, 2025
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	1584	August 3, 2023
Loo: High Pareto k diagnostic values for beta binomial regression General loo	2	1638	March 17, 2018
Pareto diagnostics for negative binomial versus poisson with variable intercept per observation brms loo	2	406	September 19, 2019

High pareto k diagnostic, overdispersed models and moment matching

Related topics