In R, suppose I have N data points classified into m groups, equally. Normally, if our response is a continuous Y, for a predictor X, we might generate random effects data as
N <- 100
m <- 5
RE_each <- rnorm(m, 0, 1)
RE <- rep(1:m, each=N)
Y <- X + rnorm(N, 2,2)
then we can use rstanarm to estimate this model:
rstanarm::stan_glmer(Y ~ X + (1|RE), data =...)
If instead I wanted to generate binary Y, can I do:
Y <- rbinom(N, 1, sigmoid(X+RE))
instead? Would this technially be correct to have the terms inside a sigmoid function? And then estimating with a logit link?
rstanarm::stan_glmer(Y ~ X + (1|RE), family = binomial(link = "logit"), data =...)
Yeah you’re basically right but I think there are a few issues with the code:
I don’t think this quite does what you want, but it’s close. For one, RE_each and RE aren’t used anywhere after they’re created, but they should factor into the calculation of Y. Also RE <- rep(1:m, each=N) will create m*N = 500 values of RE, so that doesn’t seem right. To generate so-called “random effects” data, I think you’d want something like this:
N <- 100
X <- rnorm(N)
m <- 5
group_id <- rep(1:m, each = N/m) # not each = N
# some parameter values to use (could sample them from prior or just pick them)
sigma_e <- 1 # error sd
alpha <- 1 # global intercept
beta <- 1 # coefficient on X
sigma_alpha <- 1 # hierarchical sd (for 'random effects')
alpha_group <- rnorm(m, 0, sigma_alpha) # 'random effects'
y <- alpha + alpha_group[group_id] + beta * X + rnorm(N, 0, sigma_e)
To do this for binary data, you’re right that you switch to rbinom and use a sigmoidal function. Something like this: