In R, suppose I have N data points classified into m groups, equally. Normally, if our response is a continuous Y, for a predictor X, we might generate random effects data as

N <- 100
m <- 5
RE_each <- rnorm(m, 0, 1)
RE <- rep(1:m, each=N)
Y <- X + rnorm(N, 2,2)

then we can use rstanarm to estimate this model:

rstanarm::stan_glmer(Y ~ X + (1|RE), data =...)

If instead I wanted to generate binary Y, can I do:

Y <- rbinom(N, 1, sigmoid(X+RE))

instead? Would this technially be correct to have the terms inside a sigmoid function? And then estimating with a logit link?

rstanarm::stan_glmer(Y ~ X + (1|RE), family = binomial(link = "logit"), data =...)

Yeah youâ€™re basically right but I think there are a few issues with the code:

I donâ€™t think this quite does what you want, but itâ€™s close. For one, RE_each and RE arenâ€™t used anywhere after theyâ€™re created, but they should factor into the calculation of Y. Also RE <- rep(1:m, each=N) will create m*N = 500 values of RE, so that doesnâ€™t seem right. To generate so-called â€śrandom effectsâ€ť data, I think youâ€™d want something like this:

N <- 100
X <- rnorm(N)
m <- 5
group_id <- rep(1:m, each = N/m) # not each = N
# some parameter values to use (could sample them from prior or just pick them)
sigma_e <- 1 # error sd
alpha <- 1 # global intercept
beta <- 1 # coefficient on X
sigma_alpha <- 1 # hierarchical sd (for 'random effects')
alpha_group <- rnorm(m, 0, sigma_alpha) # 'random effects'
y <- alpha + alpha_group[group_id] + beta * X + rnorm(N, 0, sigma_e)

To do this for binary data, youâ€™re right that you switch to rbinom and use a sigmoidal function. Something like this: