Survival analysis with simulated data - the model doesn't recover the parameters that I used to build the data

fsdias · April 12, 2021, 11:19am

Hi,

I built this simulated dataset of cat adoptions in an animal care facility. The goal is to determine if black cats are less likely to be adopted than non-black cats. This is inspired by dataset that I found on Richard McElreath’s lectures (Statistical Rethinking Winter 2019 Lecture 13 - YouTube).

I simulated data from two exponential distributions. For black cats I chose lambda= 0.02 and for non-black cats lambda = 0.03. I randomly added censoring indicating that some of the cats had not been adopted when the study ended.

black<-round(rexp(400,0.02))
other<-round(rexp(400,0.03))

d<-data.frame(days_to_event=c(black,other),color_id=c(rep(1,400),rep(2,400)),
                 adopted=rbinom(400, 1, 0.5))

This is the model that I wrote:

data{
    int N;
    int adopted[N];
    vector[N] days_to_event;
    int color_id[N];

}
parameters{
    vector[2] a;
}
transformed parameters{
  vector[N] lambda;
  vector[N] mu;
    for ( i in 1:N ) {
        mu[i] = a[color_id[i]];
        mu[i] = exp(mu[i]);
    }
    for ( i in 1:N ) {
        lambda[i] = 1/mu[i];
    }
}
model{
    a ~ normal( 0 , 1 );
    for ( i in 1:N ) 
        if ( adopted[i] == 0 ) target += exponential_lccdf(days_to_event[i] | lambda[i]);
    for ( i in 1:N ) 
        if ( adopted[i] == 1 ) days_to_event[i] ~ exponential( lambda[i] );
}
generated quantities{
  real pred[N];
   for ( i in 1:N ) 
   if ( adopted[i] == 1 ) pred[i] = exponential_rng( lambda[i] );
   for ( i in 1:N ) 
   if ( adopted[i] == 0 ) pred[i] = exponential_lccdf(days_to_event[i] | lambda[i]);
}

I run the model:

dat <- list(
  N=nrow(d),
  days_to_event = d$days_to_event,
  color_id = d$color_id ,
  adopted = d$adopted
)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
fit<-stan("model.stan",data=dat)

and obtained the following results:

print(fit,"a")
Inference for Stan model: censored_cats_gq.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

     mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
a[1] 4.64       0 0.07  4.5 4.59 4.64 4.68  4.79  3271    1
a[2] 4.25       0 0.07  4.1 4.20 4.25 4.29  4.39  3617    1

Samples were drawn using NUTS(diag_e) at Mon Apr 12 12:12:12 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The is problem is that:

lambda black cats = 1/exp(4.64) = 0.009657698
lambda non-black cats = 1/exp(4.25) = 0.01426423

which are quite different from the lambdas that I used to simulate the data (0.02 and 0.03).

What am I doing wrong? Feels like I’m missing something really obvious.
Tks

fsdias · April 14, 2021, 2:51pm

I was generating censored observations randomly, when I should have censored observations beyond a specific threshold:

d<-data.frame(days_to_event=c(black,other),color_id=c(rep(1,1000),rep(2,1000)))
d<- d %>% mutate(adopted = if_else(days_to_event >= 200, 0, 1))

Results here:

Inference for Stan model: model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

     mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
a[1] 4.01       0 0.03 3.95 3.99 4.01 4.03  4.08  3027    1
a[2] 3.50       0 0.03 3.43 3.47 3.49 3.52  3.56  3368    1

Samples were drawn using NUTS(diag_e) at Wed Apr 14 15:46:05 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

> 1/exp(4.01)
[1] 0.0181334
> 1/exp(3.50)
[1] 0.03019738

I tried to delete this post because the problem was not related to model specification or with Stan code, but it seems only an admin can do it.

Topic		Replies	Views
Help with simple survival data simulation and exponential model fitting Modeling fitting-issues	2	94	August 18, 2025
Model returning Log probability evaluates to log(0), i.e. negative infinity Modeling	3	379	October 7, 2021
Imputing parameter from survival analysis (gamma distribution) Modeling rstan , hierarchical-model , survival	0	369	January 20, 2023
Having trouble recovering known parameters from simulated data Modeling cmdstanpy	2	689	June 4, 2021
Problem running RStan with simulated data in R RStan rstan , fitting-issues	26	2053	February 5, 2022

Survival analysis with simulated data - the model doesn't recover the parameters that I used to build the data

Related topics