High pareto k diagnostic, overdispersed models and moment matching

Hi there,

I am trying to fit a biphasic model of decay to some antibody titres using Rstan. The final model will account for measurement errors as discussed here 6.1 Bayesian measurement error model | Stan User’s Guide.

When I ran this model, all the diagnostics seemed okay (Rhat, divergent transitions, ESS etc,), however, all the pareto k values were >0.7 when using loo. I tried using an overdispersed model (t student distribution, rather than normal), however this increased the number of pareto k that were above 1.
I’ve shared a simpler model below, which just fits a biphasic model to a subset of my data, ignoring any measurement error.

data{
   int N; 
   int T ; 
   real time[T];
   real mu[N];                 // observed average titres
}


parameters {
  real<upper = 0> pi_1;      // decay rate 1
  real pi_2;                         // decay rate 2 - can be positive to allow for titre boosting 
  real<lower = 0> ts   ;       // time switch decay rate
  real<lower = 0> sigma;
}

transformed parameters {
  real neut[T] ;
  real n_N[N] ;
  
  // biphasic model 
   for(t in 1:T)
    neut[t] = mu[1] * (exp(pi_1 * time[t] + pi_2 * ts) + exp(pi_2 * time[t] + pi_1 * ts)) / (exp(pi_1 * ts) + exp(pi_2 * ts)) ;

// match titre time points to data 

   n_N[1] =    neut[1] ;
   n_N[2] =    neut[6] ;
   n_N[3] =    neut[12] ;
   n_N[4] =    neut[24] ;
   n_N[5] =    neut[36] ;
   n_N[6] =    neut[48] ;
}


model {

//   target += normal_lpdf(mu| n_N, sigma);
     target += student_t_lpdf(mu|(N-1), n_N, sigma);
     
// priors
  pi_1 ~ normal(0,0.1) ;
  pi_2 ~ normal(0,0.1) ;
  ts ~ normal(1,2) ;
  sigma ~ cauchy(0,5);
}

generated quantities{
 vector[N] log_lik ;
 // for(i in 1:N)  log_lik[i]     = normal_lpdf(mu[i]| n_N[i], sigma) ;
 for(i in 1:N)  log_lik[i] = student_t_lpdf(mu|(N-1), n_N, sigma) ;
}

The data used is:

stan_data = list(
  mu = c(2.264818, 1.944483, 1.886491, 1.863323, 1.913896, 1.869833), 
  SE = c(0.01921157, 0.02387729, 0.02568116, 0.03187213, 0.03567769, 0.03210992), 
  T = 48, 
  N = 6,
  time = 1:48)

If I use a normal likelihood, the loo output is:

Computed from 16000 by 6 log-likelihood matrix

         Estimate  SE
elpd_loo      2.7 0.9
p_loo         2.5 0.7
looic        -5.4 1.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     5     83.3%   1638      
 (0.5, 0.7]   (ok)       0      0.0%   <NA>      
   (0.7, 1]   (bad)      1     16.7%   60        
   (1, Inf)   (very bad) 0      0.0%   <NA>      
See help('pareto-k-diagnostic') for details

Whereas, if I use a student T likelihood, the loo output is:

Computed from 16000 by 6 log-likelihood matrix

         Estimate  SE
elpd_loo    -10.6 0.0
p_loo        41.6 0.0
looic        21.2 0.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     0       0.0%  <NA>      
 (0.5, 0.7]   (ok)       0       0.0%  <NA>      
   (0.7, 1]   (bad)      0       0.0%  <NA>      
   (1, Inf)   (very bad) 6     100.0%  2         
See help('pareto-k-diagnostic') for details.

I also tried using moment-matching using loo(fit, moment_matching = TRUE) - but this throws the error: Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: the model object is not created or not valid.

My questions are:

  1. Why would using a student t distribution increase the number of influential observations?
  2. Why does moment matching throw that error? I have found posts about moment matching not working, but they seem to be older and resolved.
  3. If my model is simply misspecified, do you have any general suggestions about what the next steps should be?

I realise that I currently high ratio of parameters to observations, but in my full model I have 48 observations rather than 6.

Thank you in advance!

You have accidentally dropped [i] from indexing mu. Does it work better if you add that [i] there?