Hi there,
I am trying to fit a biphasic model of decay to some antibody titres using Rstan. The final model will account for measurement errors as discussed here 6.1 Bayesian measurement error model | Stan User’s Guide.
When I ran this model, all the diagnostics seemed okay (Rhat, divergent transitions, ESS etc,), however, all the pareto k values were >0.7 when using loo. I tried using an overdispersed model (t student distribution, rather than normal), however this increased the number of pareto k that were above 1.
I’ve shared a simpler model below, which just fits a biphasic model to a subset of my data, ignoring any measurement error.
data{
int N;
int T ;
real time[T];
real mu[N]; // observed average titres
}
parameters {
real<upper = 0> pi_1; // decay rate 1
real pi_2; // decay rate 2 - can be positive to allow for titre boosting
real<lower = 0> ts ; // time switch decay rate
real<lower = 0> sigma;
}
transformed parameters {
real neut[T] ;
real n_N[N] ;
// biphasic model
for(t in 1:T)
neut[t] = mu[1] * (exp(pi_1 * time[t] + pi_2 * ts) + exp(pi_2 * time[t] + pi_1 * ts)) / (exp(pi_1 * ts) + exp(pi_2 * ts)) ;
// match titre time points to data
n_N[1] = neut[1] ;
n_N[2] = neut[6] ;
n_N[3] = neut[12] ;
n_N[4] = neut[24] ;
n_N[5] = neut[36] ;
n_N[6] = neut[48] ;
}
model {
// target += normal_lpdf(mu| n_N, sigma);
target += student_t_lpdf(mu|(N-1), n_N, sigma);
// priors
pi_1 ~ normal(0,0.1) ;
pi_2 ~ normal(0,0.1) ;
ts ~ normal(1,2) ;
sigma ~ cauchy(0,5);
}
generated quantities{
vector[N] log_lik ;
// for(i in 1:N) log_lik[i] = normal_lpdf(mu[i]| n_N[i], sigma) ;
for(i in 1:N) log_lik[i] = student_t_lpdf(mu|(N-1), n_N, sigma) ;
}
The data used is:
stan_data = list(
mu = c(2.264818, 1.944483, 1.886491, 1.863323, 1.913896, 1.869833),
SE = c(0.01921157, 0.02387729, 0.02568116, 0.03187213, 0.03567769, 0.03210992),
T = 48,
N = 6,
time = 1:48)
If I use a normal likelihood, the loo output is:
Computed from 16000 by 6 log-likelihood matrix
Estimate SE
elpd_loo 2.7 0.9
p_loo 2.5 0.7
looic -5.4 1.9
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 5 83.3% 1638
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 1 16.7% 60
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details
Whereas, if I use a student T likelihood, the loo output is:
Computed from 16000 by 6 log-likelihood matrix
Estimate SE
elpd_loo -10.6 0.0
p_loo 41.6 0.0
looic 21.2 0.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 6 100.0% 2
See help('pareto-k-diagnostic') for details.
I also tried using moment-matching using loo(fit, moment_matching = TRUE)
- but this throws the error: Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: the model object is not created or not valid
.
My questions are:
- Why would using a student t distribution increase the number of influential observations?
- Why does moment matching throw that error? I have found posts about moment matching not working, but they seem to be older and resolved.
- If my model is simply misspecified, do you have any general suggestions about what the next steps should be?
I realise that I currently high ratio of parameters to observations, but in my full model I have 48 observations rather than 6.
Thank you in advance!