Hi,
I built this simulated dataset of cat adoptions in an animal care facility. The goal is to determine if black cats are less likely to be adopted than non-black cats. This is inspired by dataset that I found on Richard McElreath’s lectures (Statistical Rethinking Winter 2019 Lecture 13 - YouTube).
I simulated data from two exponential distributions. For black cats I chose lambda= 0.02 and for non-black cats lambda = 0.03. I randomly added censoring indicating that some of the cats had not been adopted when the study ended.
black<-round(rexp(400,0.02))
other<-round(rexp(400,0.03))
d<-data.frame(days_to_event=c(black,other),color_id=c(rep(1,400),rep(2,400)),
adopted=rbinom(400, 1, 0.5))
This is the model that I wrote:
data{
int N;
int adopted[N];
vector[N] days_to_event;
int color_id[N];
}
parameters{
vector[2] a;
}
transformed parameters{
vector[N] lambda;
vector[N] mu;
for ( i in 1:N ) {
mu[i] = a[color_id[i]];
mu[i] = exp(mu[i]);
}
for ( i in 1:N ) {
lambda[i] = 1/mu[i];
}
}
model{
a ~ normal( 0 , 1 );
for ( i in 1:N )
if ( adopted[i] == 0 ) target += exponential_lccdf(days_to_event[i] | lambda[i]);
for ( i in 1:N )
if ( adopted[i] == 1 ) days_to_event[i] ~ exponential( lambda[i] );
}
generated quantities{
real pred[N];
for ( i in 1:N )
if ( adopted[i] == 1 ) pred[i] = exponential_rng( lambda[i] );
for ( i in 1:N )
if ( adopted[i] == 0 ) pred[i] = exponential_lccdf(days_to_event[i] | lambda[i]);
}
I run the model:
dat <- list(
N=nrow(d),
days_to_event = d$days_to_event,
color_id = d$color_id ,
adopted = d$adopted
)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
fit<-stan("model.stan",data=dat)
and obtained the following results:
print(fit,"a")
Inference for Stan model: censored_cats_gq.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a[1] 4.64 0 0.07 4.5 4.59 4.64 4.68 4.79 3271 1
a[2] 4.25 0 0.07 4.1 4.20 4.25 4.29 4.39 3617 1
Samples were drawn using NUTS(diag_e) at Mon Apr 12 12:12:12 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The is problem is that:
lambda black cats = 1/exp(4.64) = 0.009657698
lambda non-black cats = 1/exp(4.25) = 0.01426423
which are quite different from the lambdas that I used to simulate the data (0.02 and 0.03).
What am I doing wrong? Feels like I’m missing something really obvious.
Tks