Dear Stan Community,
I am relatively new to Stan but have been struggling with the following problems for weeks.
I have data which is created here. Essentially, I’m looking at a case where someone takes 1024 rapid diagnostic tests over the course of their infection, and when their viral load (the amount of virus in their body) is above 10, there is a 90% chance that the test will be positive. When the viral load is below 10 or at 0, there is a 10% chance that the test will be positive.
# Fix the parameters and ground truth of model
t0 = 5
t1 = 8
t2 = 14
peakvl = 30
lod = 10
m1 = peakvl / (t1 - t0)
m2 = -1*peakvl / (t2 - t1)
b1 = (peakvl/(t1-t0))*t0*-1
b2 = (peakvl/(t2-t1))*t2
# Define the linear function for proliferation
line1 <- function(t, m1, b1) {
return(m1 * t + b1)
}
# Define the linear function for clearance
line2 <- function(t, m2, b2) {
return(m2 * t + b2)
}
# Generate N t values
n <- 1024
random_values <- runif(n, min = 0, max = t2*1.25)
# Generate binary data, with probability based on value of t
in_tent_pos_val = 0.9
out_tent_pos_val = 0.1
rapid_test_results <- vector()
mu_values <- vector()
for (t in random_values) {
if (t <= t1) {
mu_t <- line1(t, m1, b1)
} else if (t > t1) {
mu_t <- line2(t, m2, b2)
}
mu_values <- c(mu_values, mu_t)
}
for (mu_t in mu_values) {
if (mu_t >= 10) {
rapid_test_result <- rbinom(1, size = 1, prob = in_tent_pos_val)
}
else{
rapid_test_result <- rbinom(1, size = 1, prob = out_tent_pos_val)
}
rapid_test_results <- c(rapid_test_results, rapid_test_result)
# Create dataframe
datasets <- data.frame(t = random_values, test_result=rapid_test_results)
My Stan model is the following:
stan_model_rapid <- "
data {
int<lower=0> N; // Number of data points
vector[N] t; // Time values
vector[N] test_result; // rapid test result
real f_mistake; // False negative
real lod; // Limit of detection
}
parameters {
real<lower=0> t0; // Intercept time
real<lower=t0> t1; // Time at peak viral load
real<lower=t1> t2; // Clearance time
real<lower=1> peakvl; // Peak viral load
}
transformed parameters {
// Calculate slopes
real m1;
real m2;
real b1;
real b2;
m1 = peakvl / (t1 - t0);
m2 = -1*peakvl / (t2 - t1);
b1 = (peakvl/(t1-t0))*t0*-1;
b2 = (peakvl/(t2-t1))*t2;
}
model {
vector[N] mu; // Predicted values
vector[N] likelihood;
// priors - had to be added
t0 ~ normal(8, 10) T[0, ];
t1 ~ normal(10, 10) T[0, ];
t2 ~ normal(15, 10) T[0, ];
peakvl ~ normal(20, 20) T[0, ];
// Calculate predicted value
for (i in 1:N) {
if (t[i] <= t1){
mu[i] = (m1 * t[i]) + b1;
} else if (t[i] >= t1){
mu[i] = (m2 * t[i]) + b2;
}
}
// Likelihood of binary data
for (i in 1:N) {
// Likelihood of binary data
if (mu[i] >= lod && (t[i]>t0 && t[i]<t2)) {
if(test_result[i]==1){
likelihood[i] = log(f_mistake);
}else if(test_result[i]==0){
likelihood[i] = log(1-f_mistake);
}
}else{
if(test_result[i]==1){
likelihood[i] = log(1-f_mistake);
}else if(test_result[i]==0){
likelihood[i] = log(f_mistake);
}
}
}
for (i in 1:N) {
target += likelihood[i];
}
}
"
stan_data_rapid <- list(
N = nrow(datasets),
t = datasets$t,
test_result = datasets$test_result,
lod = 10,
f_mistake=0.9
)
# Compile the model
compiled_model_rapid <- stan_model(model_code = stan_model_rapid)
# Fit the model
fit_rapid_sim <- stan(model_code = stan_model_rapid, data = stan_data_rapid, iter = 2000)
When I don’t provide priors, I get incredibly high values (at 10^307). Priors do help, but my model is still not converging on my true parameters.
I tried to simplify my model to output only log(.9) and log(.1), but have also tried using Bernoulli_lpmf. I’m struggling to calculate the likelihood of each datapoint separately, dependent on the t value and predicted mu (viral load) value.
I get the error:
Warning: There were 4000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See Runtime warnings and convergence problems Examine the pairs() plot to diagnose sampling problems Warning: The largest R-hat is 3.6, indicating chains have not mixed.
And when I look at posterior values sampled, I get horrible looking distributions, where two very different values are chosen over and over again.
Any help on this model would be greatly appreciated!
Thank you.