Hi all!
I’m new to Stan and I’m trying to model data that is generated from binomial distribution, but has outliers. For example, each hour I have 1000 attempts and the number of failures is around 1%, but rarely (every ~40 hours), it jumps to 10%.
Here is the code, from R:
model_string <- "
data{
int<lower=0> attempts[200];
int<lower=0> drops[200];
}
parameters{
real<lower=0,upper=1> p;
real<lower=0,upper=1> spike_chance;
real<lower=0,upper=1> spike_height;
}
model {
real real_p;
int is_spike;
spike_height ~ uniform(0, 1);
spike_chance ~ uniform(0, 1);
p ~ uniform(0, 1);
is_spike ~ binomial(1, spike_chance);
real_p = p + is_spike * spike_height;
drops ~ binomial( attempts , real_p );
}
"
my_attempts = rep(1000,200)
my_drops = rbinom(200,1000,0.01)
# Add some outliers
my_drops[seq(0,200,40)] = 100
data_list <- list(attempts = my_attempts, drops = my_drops)
# Compiling and producing posterior samples from the model.
stan_samples <- stan(model_code = model_string, data = data_list, chains = 1)
The approach I’ve taken is to have a drop rate as normal drop rate + is_spike * spike_drop_rate, where is_spike is integer that is 1 approximately 1% of the time. However, this model fails with the message:
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: binomial_lpmf: Successes variable is -2147483648, but must be in the interval [0, 1] (in 'model1208741e5057_11e3fb205845c7ea7b9ea0b3acfc1cce' at line 19)
Is my approach correct? If so, how can I fix my model? Thanks in advance!