How to code correlations between Bernoulli trials?

Let’s say that p persons are throwing a basketball into a set of b baskets which have different radii and the outcome, Spb is binary, indicating whether the basket was scored or not. Let ap be person p’s throwing accuracy and rb the radius of the bth basket. The model could be formulated in Stan as such:

S[p,b] ~ bernoulli_logit(a[p] + r[b])

However, what if we wanted to include correlations between trials? Let’s say that scoring a basket gives you a confidence boost which in turn increases your probability of scoring the next hits. Likewise, failing to score a basket makes you more anxious and decreases your chances of subsequent scores. So if Sp1 = 1, the chances that Sp2 or Sp3 will also be 1 are increased.

How could we code these correlations in Stan? If I was dealing with continuous variables, I guess I would use the multi_normal_lpdf function and pass the covariances between trials via the Sigma argument, but I don’t see a way to do that with binary variables since the bernoulli_logit or other related functions don’t have arguments related to the covariance structure.

Furthermore, I don’t have a feeling that simply using the binomial distribution would estimate these correlations. I may be incorrect with this, though. Any help would be appreciated.

1 Like

The easiest is to do a multi-level model with a random intercept across people. This will introduce correlation.

1 Like

It sounds like you want some sort of autoregressive structure on the outcome. In most glms, we can introduce an autoregressive structure on the link scale by estimating observation-specific random effects that are modeled with an autoregressive structure. The challenge is that in Bernoulli models, the standard deviation of observation-specific residuals (where the autoregressive structure would show up) are not identified. There are at least two options:

  • One option is to fix the standard deviation a priori, but this choice will likely seem quite arbitrary.
  • Another option is to fit an auto-logistic model, where the outcome of the previous shot is included as a covariate on the outcome of the current shot.

Also note that in your formulation of the model:

you presumably want to include a coefficient (i.e. a slope term) that multiplies the radius covariate r[b].