Help with Hypothesis Testing

Hello. I have ran a simple uni-variate model and would like to do the Bayesian equivalent to a hypothesis test.

The model is below but at a high level, I have a dependent variable ‘claim count’ and predictor ‘repair_letter’. I have fit the model. I am asking guidance on showing the difference in claim count based on if an observation gets a repair letter (variable denoted by a 1 or 0)


#model code
model_cod = """
data {
int<lower=0> N;
int<lower=0> clmcnt[N];
vector[N] repair_flag;
}
parameters {
real beta;
}
model {
beta ~ normal(0,1);
clmcnt~poisson_log(repair_flag*beta);
}
"""
sm = StanModel(model_code = model_cod)

data1 = {'N': df_join.shape[0], 'repair_flag':df_join['repair_flag'].values
         , 'clmcnt':df_join['clmcnt'].values}
sm_fit = sm.sampling(data = data1, init = .3, chains = 3, iter = 5000, warmup = 1000)

print(sm_fit)

Print

       mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
beta  -2.81  3.1e-4   0.02  -2.85  -2.82  -2.81   -2.8  -2.77   4067    1.0
lp__ -5.5e6    0.01   0.72 -5.5e6 -5.5e6 -5.5e6 -5.5e6 -5.5e6   4742    1.0

Any guidance on analyzing and visualizing would be great.

1 Like

Code your repair_flag variable as -.5 and +.5, then you can interpret the beta parameter as the difference. See this lecture.

1 Like

Actually, are you sure you have your model coded as you intend? with repair_flag as a 0/1 coded variable, you’re conveying perfect certainty that the entries coded with 0 have an alpha of 1 (log(0)). I suspect that you want an intercept in your model as well.

ooo. Didn’t think about that. Could you point me to sampling from a posterior as well?

Sorry, I don’t know what you mean.

Here’s your model with an intercept:

data {
    int<lower=0> N;
    int<lower=0> clmcnt[N];
    vector[N] repair_flag;
}
parameters {
    real intercept;
    real beta;
}
model {
    intercept ~ normal(0,1);
    beta ~ normal(0,1);
    clmcnt ~ poisson_log( intercept + repair_flag*beta );
}

If you code your repair flag with -.5/+.5, beta will be the difference between conditions and intercept will be the average across conditions. Be sure to modify the priors to convey your actual beliefs.

Thank you. I think i’m trying to do what generated quantities does. Does this look correct?

generated quantities {
vector[N_new] y_new;
for (n in 1:N_new)
    y_new[n] = poisson_rng(alpha[n] + repair_flag*beta[n])
}

This will generate new count observations given the posterior.

Thanks!