Code for Adolescent smoking example and posterior predictive checking

I am trying to find code to support the Adolescent smoking example in Chapter 6 of BDA3. It doesn’t appear to exist in the Git repo for that chapter and the original publication fits the model in SAS. Any help would be greatly appreciated.

Hi,
To which git repo do you refer to? I’m not aware of git repo having other BDA3 Stan models than 8 schools. It would be nice to have Stan models and notebooks for all the examples in BDA3, but we haven’t had time to do that… The model is a logistic regression model, so you could check the models described in Stan reference manual, Chapter 8, section “Logistic and Probit Regression”. If you use R, it would be really use to make the model with rstanarm https://cran.r-project.org/web/packages/rstanarm/vignettes/binomial.html.

Aki

I’m guessing example models: https://github.com/stan-dev/example-models

Thanks for the responses and apologies for not being more clear. The github repo I was referring to is the one associated with the R examples from the book. That link has the speed of light examples from Chapter 6 (on posterior predictive checking), but not the smoking example. I can code a logistic regression in Stan (or rstanarm), but this model actually seems to be a mixture model (or a zero-inflated binomial model) which is more like the application I’m actually analyzing. I essentially have two problems: 1) I’m uncertain how to code the mixture; I’ve found examples using zero-inflated Poisson, but am unsure how to extend those to situations where both outcomes are binomial and 2) I’m not sure how to construct the posterior predictive assessment given the mixture model. Any guidance would be much appreciated and I’d be happy to contribute a model or notebook with my actual analysis once I’ve figured out how to get this working…
Best,
Matt

Thanks for the clarification. Sorry, I forgot that there’s a second model in the smoking example, which is a mixture model. I think I know how to write this, but need some time to check it.

Aki

Excellent. Thank you for any help you may be able to give.
M

Hi,
The model 2 is a zero inflated model. See Stan reference manual Section 12.7 “Zero-Inflated and Hurdle Models”. It has an example of zero inflated Poisson model. Just replace the Poisson with model 1.

Aki

To repeat the posterior predictive checking in the BDA3, you need to add generated quantities block, where you simulate from the posterior predictive distribution of the model. First sample from the susceptibility model using bernoulli_logit_rng and conditionally on S_j=1 sample from the model 1 (again with bernoulli_logit_rng). You can then compute the indicator variables for never/always/incident-smokers in generated quantities or in R.

Aki

I’m sorry - I’m not sure what you mean by models 1 and 2? Is this code that exists somewhere else? Also, to implement the model in 12.7, I just replace the poisson language with a second binomial (or bernoulli) statement?

Model 1: BDA3. p. 148 “Our first model is a hierarchical logistic regression, in which the probability of smoking depends on sex, parental smoking, the wave of the study, and an individual parameter for the person.”

Model 2: BDA3, p. 149 "The second model is an expansion of the first, in which each person j has an unobserved ‘susceptibility’ status S j that equals 1 if the person might possibly smoke or 0 if he or she is ‘immune’ from smoking (that is, has no chance of becoming a smoker).

Stan reference manual 2.15.0 p. 195 has equation for zero inflation Poisson model. In that model the first line is for the y=0, but in the smoking example the corresponding part is for individuals who didn’t smoke during the study. Replace theta with model Pr(S_j = 1) (BDA 3 p. 149) and replace Poisson part with Pr(y_{jt} = 1) (BDA 3 p. 148). The reference manual has also the code which shows the basic principle for writing the mixture model in Stan language. Before writing zero inflated model, it’s good to start by writing separately models for Pr(y_{jt} = 1) and Pr(S_j = 1), as they are easier to get right. When you have those working it should be easy to combine them. I hope this helps you forward.

Aki

2 Likes