# High pareto k diagnostic, overdispersed models and moment matching

Hi there,

I am trying to fit a biphasic model of decay to some antibody titres using Rstan. The final model will account for measurement errors as discussed here 6.1 Bayesian measurement error model | Stan User’s Guide.

When I ran this model, all the diagnostics seemed okay (Rhat, divergent transitions, ESS etc,), however, all the pareto k values were >0.7 when using loo. I tried using an overdispersed model (t student distribution, rather than normal), however this increased the number of pareto k that were above 1.
I’ve shared a simpler model below, which just fits a biphasic model to a subset of my data, ignoring any measurement error.

``````data{
int N;
int T ;
real time[T];
real mu[N];                 // observed average titres
}

parameters {
real<upper = 0> pi_1;      // decay rate 1
real pi_2;                         // decay rate 2 - can be positive to allow for titre boosting
real<lower = 0> ts   ;       // time switch decay rate
real<lower = 0> sigma;
}

transformed parameters {
real neut[T] ;
real n_N[N] ;

// biphasic model
for(t in 1:T)
neut[t] = mu[1] * (exp(pi_1 * time[t] + pi_2 * ts) + exp(pi_2 * time[t] + pi_1 * ts)) / (exp(pi_1 * ts) + exp(pi_2 * ts)) ;

// match titre time points to data

n_N[1] =    neut[1] ;
n_N[2] =    neut[6] ;
n_N[3] =    neut[12] ;
n_N[4] =    neut[24] ;
n_N[5] =    neut[36] ;
n_N[6] =    neut[48] ;
}

model {

//   target += normal_lpdf(mu| n_N, sigma);
target += student_t_lpdf(mu|(N-1), n_N, sigma);

// priors
pi_1 ~ normal(0,0.1) ;
pi_2 ~ normal(0,0.1) ;
ts ~ normal(1,2) ;
sigma ~ cauchy(0,5);
}

generated quantities{
vector[N] log_lik ;
// for(i in 1:N)  log_lik[i]     = normal_lpdf(mu[i]| n_N[i], sigma) ;
for(i in 1:N)  log_lik[i] = student_t_lpdf(mu|(N-1), n_N, sigma) ;
}
``````

The data used is:

``````stan_data = list(
mu = c(2.264818, 1.944483, 1.886491, 1.863323, 1.913896, 1.869833),
SE = c(0.01921157, 0.02387729, 0.02568116, 0.03187213, 0.03567769, 0.03210992),
T = 48,
N = 6,
time = 1:48)

``````

If I use a normal likelihood, the loo output is:

``````Computed from 16000 by 6 log-likelihood matrix

Estimate  SE
elpd_loo      2.7 0.9
p_loo         2.5 0.7
looic        -5.4 1.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     5     83.3%   1638
(0.5, 0.7]   (ok)       0      0.0%   <NA>
(0.7, 1]   (bad)      1     16.7%   60
(1, Inf)   (very bad) 0      0.0%   <NA>
See help('pareto-k-diagnostic') for details
``````

Whereas, if I use a student T likelihood, the loo output is:

``````Computed from 16000 by 6 log-likelihood matrix

Estimate  SE
elpd_loo    -10.6 0.0
p_loo        41.6 0.0
looic        21.2 0.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     0       0.0%  <NA>
(0.5, 0.7]   (ok)       0       0.0%  <NA>
(0.7, 1]   (bad)      0       0.0%  <NA>
(1, Inf)   (very bad) 6     100.0%  2
See help('pareto-k-diagnostic') for details.
``````

I also tried using moment-matching using `loo(fit, moment_matching = TRUE)` - but this throws the error: `Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: the model object is not created or not valid`.

My questions are:

1. Why would using a student t distribution increase the number of influential observations?
2. Why does moment matching throw that error? I have found posts about moment matching not working, but they seem to be older and resolved.
3. If my model is simply misspecified, do you have any general suggestions about what the next steps should be?

I realise that I currently high ratio of parameters to observations, but in my full model I have 48 observations rather than 6.

You have accidentally dropped `[i]` from indexing `mu`. Does it work better if you add that `[i]` there?