Fitting a Bernoulli model conditionally to data

gg_co · May 1, 2024, 1:36am

Dear Stan Community,

I am relatively new to Stan but have been struggling with the following problems for weeks.

I have data which is created here. Essentially, I’m looking at a case where someone takes 1024 rapid diagnostic tests over the course of their infection, and when their viral load (the amount of virus in their body) is above 10, there is a 90% chance that the test will be positive. When the viral load is below 10 or at 0, there is a 10% chance that the test will be positive.

# Fix the parameters and ground truth of model
t0 = 5
t1 = 8
t2 = 14
peakvl = 30
lod = 10

m1 = peakvl / (t1 - t0)
m2 = -1*peakvl / (t2 - t1)
b1 = (peakvl/(t1-t0))*t0*-1
b2 = (peakvl/(t2-t1))*t2

# Define the linear function for proliferation
line1 <- function(t, m1, b1) {
  return(m1 * t + b1)
}
# Define the linear function for clearance
line2 <- function(t, m2, b2) {
  return(m2 * t + b2)
}

# Generate N t values
n <- 1024
random_values <- runif(n, min = 0, max = t2*1.25)

# Generate binary data, with probability based on value of t
in_tent_pos_val = 0.9
out_tent_pos_val = 0.1

rapid_test_results <- vector()
mu_values <- vector()

for (t in random_values) {
  if (t <= t1) {
    mu_t <- line1(t, m1, b1)
  } else if (t > t1) {
    mu_t <- line2(t, m2, b2)
  }
  mu_values <- c(mu_values, mu_t)
}

for (mu_t in mu_values) {
  if (mu_t >= 10) {
    rapid_test_result <- rbinom(1, size = 1, prob = in_tent_pos_val)
  }
  else{
    rapid_test_result <- rbinom(1, size = 1, prob = out_tent_pos_val)
  }
  rapid_test_results <- c(rapid_test_results, rapid_test_result)

# Create dataframe
datasets <- data.frame(t = random_values, test_result=rapid_test_results)

My Stan model is the following:


stan_model_rapid <- "
data {
  int<lower=0> N;         // Number of data points
  vector[N] t;            // Time values
  vector[N] test_result; // rapid test result
  real f_mistake;             // False negative
  real lod;               // Limit of detection
}

parameters {
  real<lower=0> t0;       // Intercept time
  real<lower=t0> t1;       // Time at peak viral load
  real<lower=t1> t2;       // Clearance time
  real<lower=1> peakvl;   // Peak viral load
}

transformed parameters {
  // Calculate slopes
  real m1;
  real m2;
  real b1;
  real b2;

  m1 = peakvl / (t1 - t0);
  m2 = -1*peakvl / (t2 - t1);
  b1 = (peakvl/(t1-t0))*t0*-1;
  b2 = (peakvl/(t2-t1))*t2;
}

model {
  vector[N] mu;           // Predicted values
  vector[N] likelihood;
  
  // priors - had to be added
  t0 ~ normal(8, 10) T[0, ];
  t1 ~ normal(10, 10) T[0, ];
  t2 ~ normal(15, 10) T[0, ];
  peakvl ~ normal(20, 20) T[0, ];


  // Calculate predicted value
  for (i in 1:N) {
    if (t[i] <= t1){
      mu[i] = (m1 * t[i]) + b1;
    } else if (t[i] >= t1){
      mu[i] = (m2 * t[i]) + b2;
      }
  }

    // Likelihood of binary data
  for (i in 1:N) {
    // Likelihood of binary data
    if (mu[i] >= lod && (t[i]>t0 && t[i]<t2)) {
      if(test_result[i]==1){
        likelihood[i] = log(f_mistake);
      }else if(test_result[i]==0){
       likelihood[i] = log(1-f_mistake);
      }
    }else{
      if(test_result[i]==1){
        likelihood[i] = log(1-f_mistake);
      }else if(test_result[i]==0){
        likelihood[i] = log(f_mistake);
    }
    }
  }

for (i in 1:N) {
  target += likelihood[i];
}

}
"
stan_data_rapid <- list(
  N = nrow(datasets),
  t = datasets$t,
  test_result = datasets$test_result,
  lod = 10,
  f_mistake=0.9
)

# Compile the model
compiled_model_rapid <- stan_model(model_code = stan_model_rapid)

# Fit the model
fit_rapid_sim <- stan(model_code = stan_model_rapid, data = stan_data_rapid, iter = 2000)

When I don’t provide priors, I get incredibly high values (at 10^307). Priors do help, but my model is still not converging on my true parameters.

I tried to simplify my model to output only log(.9) and log(.1), but have also tried using Bernoulli_lpmf. I’m struggling to calculate the likelihood of each datapoint separately, dependent on the t value and predicted mu (viral load) value.

I get the error:
Warning: There were 4000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See Runtime warnings and convergence problems Examine the pairs() plot to diagnose sampling problems Warning: The largest R-hat is 3.6, indicating chains have not mixed.

And when I look at posterior values sampled, I get horrible looking distributions, where two very different values are chosen over and over again.

Any help on this model would be greatly appreciated!

Thank you.

Topic		Replies	Views
Add an unknown discrete parameter following a Bernoulli distribution Modeling rstan , techniques	15	283	April 28, 2025
Multivariate bernoulli model Modeling specification	6	1087	February 26, 2021
Random Poisson variable is a negative but must be nonnegative! Modeling fitting-issues , poisson	6	400	February 13, 2024
Mixture of Multivariate Bernouilli Modeling	5	879	September 29, 2019
Bernoulli process with observational error Modeling rstan , techniques	4	484	December 8, 2023

Fitting a Bernoulli model conditionally to data

Related topics