How can one generate binary data with a random effects term that can be verified with stan_glmer under a logit link?

In R, suppose I have N data points classified into m groups, equally. Normally, if our response is a continuous Y, for a predictor X, we might generate random effects data as

N <- 100
m <- 5
RE_each <- rnorm(m, 0, 1)
RE <- rep(1:m, each=N)
Y <- X + rnorm(N, 2,2)

then we can use rstanarm to estimate this model:

rstanarm::stan_glmer(Y ~ X + (1|RE), data =...)

If instead I wanted to generate binary Y, can I do:

Y <- rbinom(N, 1, sigmoid(X+RE))

instead? Would this technially be correct to have the terms inside a sigmoid function? And then estimating with a logit link?

rstanarm::stan_glmer(Y ~ X + (1|RE), family = binomial(link = "logit"), data =...)

Yeah you’re basically right but I think there are a few issues with the code:

I don’t think this quite does what you want, but it’s close. For one, RE_each and RE aren’t used anywhere after they’re created, but they should factor into the calculation of Y. Also RE <- rep(1:m, each=N) will create m*N = 500 values of RE, so that doesn’t seem right. To generate so-called “random effects” data, I think you’d want something like this:

N <- 100
X <- rnorm(N)
m <- 5
group_id <- rep(1:m, each = N/m) # not each = N

# some parameter values to use (could sample them from prior or just pick them)
sigma_e <- 1     # error sd
alpha <- 1       # global intercept
beta <- 1        # coefficient on X

sigma_alpha <- 1 # hierarchical sd (for 'random effects')
alpha_group <- rnorm(m, 0, sigma_alpha) # 'random effects'

y <- alpha + alpha_group[group_id] + beta * X + rnorm(N, 0, sigma_e)

To do this for binary data, you’re right that you switch to rbinom and use a sigmoidal function. Something like this:

y <- rbinom(N, 1, plogis(alpha + alpha_group[group_id] + beta * X))

where in this case I chose plogis (inverse-logit) transformation but you could use other ones of course.

And then yeah, this could be fit with stan_glmer() with family=binomial(link = "logit")

1 Like

thank you, this is very helpful, I generated mock data and was able to estimate the parameters from the generated model using stan_glmer()!

1 Like