Hi everbody,
I am trying to pass specific priors for all predictors to stan_glm.
But so far, I don’t know the exact predictors before the sampling, especially when there are interactions between factors.
So actually, I am trying to understand how the predictors are generated from the formula, also with and without including an intercept.
Examples:
set.seed(1)
x1 <- rep(c("A","B","C"),each=10)
x2 <- rep(c("A","B","C"),each=10)
y <- rnorm(30)
dat <- data.frame(y=y,x1=x1,x2=x2)
ft1 <- terms(formula("y ~ -1 + x1 + x2 + x1:x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft1, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)
# x1A -0.2371194 0.7345714
# x1B -1.4384623 0.7472358
# x1C -4.5361466 0.6962040
# x2B -2.6472157 1.1710367
# x2C -2.8038436 2.6611150
# x1B:x2B -1.5118930 1.0769750
# x1C:x2C 0.7257141 3.2930123
ft2 <- terms(formula("y ~ x1 + x2 + x1:x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft2, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)
# (Intercept) -0.2334736 0.6627297
# x1B -3.2783914 2.6099968
# x1C -2.4423757 -0.1935649
# x2B -2.1160110 2.5948198
# x2C -1.8653931 3.6096912
# x1B:x2B -3.1701715 3.1074863
# x1C:x2C -2.5162467 1.8924621
#Like ft3, but in a different order
ft3 <- terms(formula("y ~ -1 + x1:x2 + x1 + x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft3, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)
# x1A:x2A -0.0007368622 0.4966567
# x1B:x2B -2.8669084181 3.6741883
# x1C:x2C -2.5430166470 1.8279128
# x1B -4.3438427994 3.8054931
# x1C -1.9986098558 1.7621411
# x2B -1.4059656291 2.0838909
# x2C -1.5965105519 1.0129761
How can I get the predictor (names): x1A, x1B,…,x1C:x2C before the sampling, to know the number and also the order, which I need to pass the right priors?
And I also wonder (even though I don’t really need it) why the predictors change, when a different formula order is used.