I’m trying to write a model that could be a fairly straightforward logistic regression with a grouping variable, but I’m considering using a Beta prior for the distribution of group effects, and am wondering if that seems like a really bad idea to anyone here.
Here’s what the ordinary logistic regression would look like:
data {
int<lower=0> N;
// number of groups
int<lower=0> G;
vector[N] x;
// group index
int<lower=0> group[N];
int<lower=0,upper=1> y[N];
}
parameters {
real alpha;
real beta;
real group_effect[G];
real<lower=0> group_sd;
}
model {
vector[N] ghat;
group_sd ~ student_t(3, 0, 2.5);
group_effect ~ normal(0, group_sd);
// vectorizing
for(n in 1:N){
ghat[n] = group_effect[group[n]];
}
y ~ bernoulli_logit(alpha + beta * x + ghat);
}
But, an open question for this problem is whether the distribution of groups is unimodal, or could perhaps be bimodal (i.e. doers and non-doers). A beta distribution with a small \phi parameter could approximate that, so I’m considering this alternative model.
parameters {
real alpha;
real beta;
// group_effect, now with beta prior, needs bounds
real<lower=0, upper=1> group_effect[G];
real<lower=0> phi;
}
model {
vector[N] ghat;
// prior for phi should have heavier right tail
phi ~ student_t(3, 0, 20);
group_effect ~ beta(0.5 * phi, 0.5 * phi);
for(n in 1:N){
// logit transform to add into final formula
ghat[n] = logit(group_effect[group[n]]);
}
y ~ bernoulli_logit(alpha + beta * x + ghat);
}
Now, when phi
is relatively large, the distribution of groups ought to be unimodal around alpha + beta * x
, but as it gets smaller, around <2, the distribution of groups will start bifurcating.
This seem like what I want, and testing the model on simulated data hasn’t raised any problems, but if there are any concerns about the principle of what I’m doing here, I’d definitely like to know!