# How can one generate binary data with a random effects term that can be verified with stan_glmer under a logit link?

In `R`, suppose I have N data points classified into m groups, equally. Normally, if our response is a continuous Y, for a predictor X, we might generate random effects data as

``````N <- 100
m <- 5
RE_each <- rnorm(m, 0, 1)
RE <- rep(1:m, each=N)
Y <- X + rnorm(N, 2,2)
``````

then we can use `rstanarm` to estimate this model:

``````rstanarm::stan_glmer(Y ~ X + (1|RE), data =...)
``````

If instead I wanted to generate binary Y, can I do:

``````Y <- rbinom(N, 1, sigmoid(X+RE))
``````

instead? Would this technially be correct to have the terms inside a sigmoid function? And then estimating with a logit link?

``rstanarm::stan_glmer(Y ~ X + (1|RE), family = binomial(link = "logit"), data =...)``

Yeah youâ€™re basically right but I think there are a few issues with the code:

I donâ€™t think this quite does what you want, but itâ€™s close. For one, `RE_each` and `RE` arenâ€™t used anywhere after theyâ€™re created, but they should factor into the calculation of `Y`. Also `RE <- rep(1:m, each=N)` will create `m*N = 500` values of `RE`, so that doesnâ€™t seem right. To generate so-called â€śrandom effectsâ€ť data, I think youâ€™d want something like this:

``````N <- 100
X <- rnorm(N)
m <- 5
group_id <- rep(1:m, each = N/m) # not each = N

# some parameter values to use (could sample them from prior or just pick them)
sigma_e <- 1     # error sd
alpha <- 1       # global intercept
beta <- 1        # coefficient on X

sigma_alpha <- 1 # hierarchical sd (for 'random effects')
alpha_group <- rnorm(m, 0, sigma_alpha) # 'random effects'

y <- alpha + alpha_group[group_id] + beta * X + rnorm(N, 0, sigma_e)
``````

To do this for binary data, youâ€™re right that you switch to `rbinom` and use a sigmoidal function. Something like this:

``````y <- rbinom(N, 1, plogis(alpha + alpha_group[group_id] + beta * X))
``````

where in this case I chose `plogis` (inverse-logit) transformation but you could use other ones of course.

And then yeah, this could be fit with `stan_glmer()` with `family=binomial(link = "logit")`

1 Like

thank you, this is very helpful, I generated mock data and was able to estimate the parameters from the generated model using `stan_glmer()`!

1 Like