Modeling outliers with multi-level model fails

clockwise · October 2, 2020, 4:40pm

Hi all!

I’m new to Stan and I’m trying to model data that is generated from binomial distribution, but has outliers. For example, each hour I have 1000 attempts and the number of failures is around 1%, but rarely (every ~40 hours), it jumps to 10%.

Here is the code, from R:

model_string <- "
data{
    int<lower=0> attempts[200];
    int<lower=0> drops[200];
}
parameters{
  
    real<lower=0,upper=1> p;
    real<lower=0,upper=1> spike_chance;
    real<lower=0,upper=1> spike_height;
}
model {
    real real_p;
    int is_spike;
    
    spike_height ~ uniform(0, 1);
    spike_chance ~ uniform(0, 1);
    p ~ uniform(0, 1);
    is_spike ~ binomial(1, spike_chance);

    real_p = p + is_spike * spike_height;
    drops ~ binomial( attempts , real_p );
}
"
my_attempts = rep(1000,200)
my_drops = rbinom(200,1000,0.01)

# Add some outliers
my_drops[seq(0,200,40)] = 100

data_list <- list(attempts = my_attempts, drops = my_drops)

# Compiling and producing posterior samples from the model.
stan_samples <- stan(model_code = model_string, data = data_list, chains = 1)

The approach I’ve taken is to have a drop rate as normal drop rate + is_spike * spike_drop_rate, where is_spike is integer that is 1 approximately 1% of the time. However, this model fails with the message:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: binomial_lpmf: Successes variable is -2147483648, but must be in the interval [0, 1]  (in 'model1208741e5057_11e3fb205845c7ea7b9ea0b3acfc1cce' at line 19)

Is my approach correct? If so, how can I fix my model? Thanks in advance!

bbbales2 · October 2, 2020, 9:03pm

is_spike ~ binomial(1, spike_chance);

The ~ syntax isn’t actually sampling a distribution in Stan, so this isn’t doing what you expected.

The ~ statement translates to:

target += binomial_lpdf(is_spike | 1, spike_change);

and since is_spike has not been set to any particular value, it starts off with something strange.

Stan doesn’t sample discrete parameters, but if you can integrate them out you can still work with the model. I believe what you’re describing can be fit as a mixture model (at every time point, there is a probability of the outcome coming from distribution 1 or distribution 2). There are some examples of mixture models in the manual that you might be able to adapt: https://mc-stan.org/docs/2_24/stan-users-guide/zero-inflated-section.html (and the previous sections).

Topic		Replies	Views
Mixture with deterministic Modeling	14	581	November 14, 2019
Stock and Watson outliers Modeling	2	602	June 3, 2019
Accounting for unobserved successes in binary repreated trial model Modeling ecology	3	814	December 21, 2017
Estimating the binomial rate when number of trials is uncertain Modeling	12	2702	June 27, 2018
Failure to start because of initial values Modeling	16	3527	July 31, 2017

Modeling outliers with multi-level model fails

Related topics