Hey everyone!
I’m working on a time-to-event model. Waiting times come from a mixture of two exponential distributions, with a low- (e.g., 20 minutes) and a high (e.g., 5000 minutes) averages. After a certain waiting time (e.g., 40 minutes), observations are censored.
Small R simulation:
# Number of observations
N <- 100
# Average holding times
theta <- sample(c(20, 5e3), N, replace = TRUE, prob = c(0.3, 0.7))
# True holding times
X_true <- rexp(N, 1 / theta)
# data input for Stan
d <- list()
d$N <- N
# Was the holding time censored?
d$C <- ifelse(X_true > 40, 1, 0)
# Observed holding time
d$X <- ifelse(d$C == 1, 40, X_true)
# Total time spent waiting
d$exposure <- sum(d$X)
# Number of observed events
d$count <- sum(df$C == 0)
I want to recover the means of the two exponential distributions. To do so, I wrote a first Stan model:
data{
int<lower = 0> N; // Number of observations
array[N] int<lower = 0, upper = 1> C; // Censoring level
vector[N] X; // Waiting times
}
parameters {
real<lower = 0> theta_1; // Low mean
real<lower = theta_1> theta_2; // High mean
real<lower = 0, upper = 1> omega; // Mixture weight
}
model {
// Priors
target += normal_lpdf(theta_1 | 30, 10);
target += normal_lpdf(theta_2 | 3e3, 3e3);
target += beta_lpdf(omega | 2, 2);
// Likelihood
for (n in 1:N) {
// Non-censored cases
if(C[n] == 0){
target += log_mix(omega,
exponential_lpdf(X[n] | 1.0 / theta_1),
exponential_lpdf(X[n] | 1.0 / theta_2));
}
// Censored cases
if (C[n] == 1){
target += log_mix(omega,
exponential_lccdf(X[n] | 1.0 / theta_1),
exponential_lccdf(X[n] | 1.0 / theta_2));
} // if
} // n
} // model block
It seems to work fine. However, I’d like—if possible—to code a more computationally efficient version of the model by aggregating the data. I know that a Poisson process can be described by either exponentially-distributed waiting times:
X_{j} \sim \text{Exponential}(1 / \theta),
Or by a Poisson distribution:
\sum_j \mathbb{I}(C_{j} = 0) \sim \text{Poisson}(\sum_{j}X_{j} / \theta)
However, I wonder whether the aggregated/Poisson strategy can also be applied to the mixture described above. My naive attempt was to try the following:
data{
int<lower = 0> count; // Number of non-censored observations
real exposure; // Sum of all holding times
}
parameters {
real<lower = 0> theta_1; // Low mean
real<lower = theta_1> theta_2; // High mean
real<lower = 0, upper = 1> omega; // Mixture weight
}
model {
// Priors
target += normal_lpdf(theta_1 | 30, 10);
target += normal_lpdf(theta_2 | 3e3, 3e3);
target += beta_lpdf(omega | 2, 2);
// Likelihood
target += log_mix(omega,
poisson_lpmf(count | exposure / theta_1),
poisson_lpmf(count | exposure / theta_2));
}
However, it doesn’t seem to recover the true means. Is there a way of making this work?
Please let me know if anything is unclear. Thanks for your time!
Ben