# How to know the exact predictors

Hi everbody,

I am trying to pass specific priors for all predictors to stan_glm.
But so far, I don’t know the exact predictors before the sampling, especially when there are interactions between factors.
So actually, I am trying to understand how the predictors are generated from the formula, also with and without including an intercept.

Examples:

``````set.seed(1)
x1 <- rep(c("A","B","C"),each=10)
x2 <- rep(c("A","B","C"),each=10)
y <- rnorm(30)
dat <- data.frame(y=y,x1=x1,x2=x2)

ft1 <- terms(formula("y ~ -1 + x1 + x2 + x1:x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft1, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)

# x1A     -0.2371194 0.7345714
# x1B     -1.4384623 0.7472358
# x1C     -4.5361466 0.6962040
# x2B     -2.6472157 1.1710367
# x2C     -2.8038436 2.6611150
# x1B:x2B -1.5118930 1.0769750
# x1C:x2C  0.7257141 3.2930123

ft2 <- terms(formula("y ~ x1 + x2 + x1:x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft2, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)

# (Intercept) -0.2334736  0.6627297
# x1B         -3.2783914  2.6099968
# x1C         -2.4423757 -0.1935649
# x2B         -2.1160110  2.5948198
# x2C         -1.8653931  3.6096912
# x1B:x2B     -3.1701715  3.1074863
# x1C:x2C     -2.5162467  1.8924621

#Like ft3, but in a different order
ft3 <- terms(formula("y ~ -1 + x1:x2 + x1 + x2"), keep.order=T)
samp <- rstanarm::stan_glm(ft3, data=dat, chains=1,iter=20)
rstanarm::posterior_interval(samp)

# x1A:x2A -0.0007368622 0.4966567
# x1B:x2B -2.8669084181 3.6741883
# x1C:x2C -2.5430166470 1.8279128
# x1B     -4.3438427994 3.8054931
# x1C     -1.9986098558 1.7621411
# x2B     -1.4059656291 2.0838909
# x2C     -1.5965105519 1.0129761
``````

How can I get the predictor (names): x1A, x1B,…,x1C:x2C before the sampling, to know the number and also the order, which I need to pass the right priors?
And I also wonder (even though I don’t really need it) why the predictors change, when a different formula order is used.

I think the easiest way to do this is

``````colnames(model.matrix(ft2, dat))
``````

because rstanarm is just using R’s formula handling. Why R does what it does requires some experimentation and a detailed reading of `?formula`.