I’m doing a really basic hazard model fitting using simulated data. Here is the Stan model:
data {
int N_subjects;
vector[N_subjects] time;
vector[N_subjects] event;
vector[N_subjects] group;
}
parameters {
real<lower=0> h_0; // baseline mean survival time
real B_h1;
}
transformed parameters {
vector[N_subjects] lambda;
lambda = (1.0 / h_0) * exp(group * B_h1);
}
model {
h_0 ~ lognormal(log(180), 0.2);
B_h1 ~ normal(0, 10);
target += event .* exponential_lpdf(time | lambda);
target += (1 - event) .* exponential_lccdf(time | lambda);
}
I’m the simulating some data using the following code (this is LLM-generated code but seems reasonable enough for the simulation data):
N_SUBJECTS = 2000
H0_TRUE = 180.0 # True baseline mean survival time
B_H1_TRUE = -0.6 # The true coefficient we want to recover
CENSOR_RATE = 0.3 # Percentage of subjects to be censored
# --- 2. Simulate Data ---
np.random.seed(42)
# Create the covariate (group)
group = np.random.randint(0, 2, size=N_SUBJECTS)
# Calculate the rate for each subject based on the PH model
# rate = (1 / baseline_mean_time) * hazard_ratio
lambda_ind = (1.0 / H0_TRUE) * np.exp(group * B_H1_TRUE)
# Generate true event times from an exponential distribution
# T = -log(U) / lambda, where U ~ Uniform(0,1)
u = np.random.uniform(0, 1, size=N_SUBJECTS)
true_event_time = -np.log(u) / lambda_ind
# Generate censoring times from a separate exponential distribution
# This creates random right-censoring
censor_time = np.random.exponential(scale=H0_TRUE / CENSOR_RATE, size=N_SUBJECTS)
# Observe the final time and event status
time = np.minimum(true_event_time, censor_time)
event = (true_event_time <= censor_time).astype(int)
# Create a DataFrame
df = pd.DataFrame({"time": time, "event": event, "group": group})
For some reason, I’m able to recover the B_h1
relative risk more closely using sksurv’s CoxPHSurvivalAnalysis
, while the B_h1
estimate produced by Stan is the following:
Mean MCSE ... ESS_tail R_hat
h_0 200.653000 0.004460 ... 1536.30 1.00487
B_h1 -0.377075 0.000034 ... 1569.86 1.00420
(For reference, the CoxPHSurvivalAnalysis
model produces an estimate of -0.54
)
I’m completely stumped here - what could possibly be going wrong? I’m new to Stan and just trying to figure out how to get through the basics, but somehow got caught up with a very simple model.
Thank you for the help!