# Posterior distribution for a between-subjects variable

I have a dataset `dat` with the following structure:

str(dat)
‘data.frame’: 450 obs. of 4 variables:
\$ Subject : Factor w/ 23 levels “S1”,“S2”,“S3”,…: 1 1 1 1 1 1 1 1 1 1 …
\$ Condition : Factor w/ 20levels “A”, “B”, “C”, …
\$ Value: num 0.679 0.5819 0.2531 0.0469 1.2375 …
\$ X : num 0.62 0.62 …

The variable `Condition` is a repeated-measures (or within-subject) factor, and `X` is between-subjects quantitative variable such as age (i.e., `X` varies across subjects, but does not vary across the levels of `Condition`).

I would like to obtain the posterior distribution for `X` at each level of the `Condition` factor. So, I’m thinking to perform the following Bayesian model:

library(‘rstanarm’)
options(mc.cores = 4)
fm <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1+X | Condition), data=dat, chain=4)

Then I can pull out the posterior distribution for `X` at each `Condition` level based on the output of the above model. However, as `X` does not vary across the levels, I’m not sure if the above multilevel model is appropriate for the part of `(1+X | Condition)`. For example, In the conventional statistics, the following linear mixed-effects model does not seem to make sense to me:

library(lme4)
fm2 <- lmer(Value ~ X + (1 | Subject) + (1 + X | Condition), data = dat)

Sometimes the `fm2` model may fail to converge. Even if it converges, in the output of summary(fm2), the problem will show up in the correlation being -1 between the random effects of intercept and `X` for the part of `(1 + X | Condition)`.

So, here are my questions:

1. With the following Bayesian model,

fm0 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1 | Condition), data=dat, chain=4)

Is there a way to obtain the posterior distribution for `X` at each ‘Condition’ level?

1. Is the following Bayesian model meaningful even though the corresponding lmer() model is not?

fm <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1+X | Condition), data=dat, chain=4)

1. When `X` is a between-subject factor such as gender (instead of quantitative variable), would the answers for 1) and 2) above remain the same?

(1) The `coef` method will return the sum of the posterior medians for the common parameters and the group-specific parameters. To get the entire posterior distribution, you would have to do some stuff yourself involving `as.matrix` or `as.data.frame`.
(2) The fact that `lmer` does not converge or converges to implausible values does not mean that there is anything wrong with the model. However, you might also try `Value ~ -1 + X * condition + (1 | Subject), data = dat, QR = TRUE` if you are not interested in making inferences about conditions that are not present in your dataset.
(3) Yes

2 Likes

Thanks for the quick help. I really appreciate it!

The coef method will return the sum of the posterior medians for the common parameters and the group-specific parameters. To get the entire posterior distribution, you would have to do some stuff yourself involving as.matrix or as.data.frame.

Is there any example or weblink that illustrates a way to obtain the posterior distribution for X at each `Condition` level for a model like the following:

fm0 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1 | Condition), data=dat, chain=4)

Thanks again for the great help!

Do you mean `fm0 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (X | Condition), data = dat)`? If so, it would be something like

``````df <- as.data.frame(fm0)
pX <- sapply(levels(dat\$Condition), FUN = function(j) {
df\$X + df[,paste0("b[X Condition:", j, "])]
})
``````
1 Like

Thanks a lot for the code!

Do you mean `fm0 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (X | Condition), data = dat)`?

My goal is to obtain the posterior distribution for X at each `Condition` level. However, I was not so sure if I could achieve that goal with the simpler model:

`fm0 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1 | Condition), data = dat)`

Your most recent response does seem to indicate that I would have to go with the following model to obtain the posterior distribution of X at each `Condition` level:

`fm1 <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (X | Condition), data = dat)`

Please correct me if I misunderstand the situation.

In this model,

the posterior distribution of the coefficient on `X` is the same for all levels of `Condition`.

1 Like

The fact that lmer does not converge or converges to implausible values does not mean that there is anything wrong with the model.

Again my goal is to obtain the posterior distribution for the effect of X at each `Condition` level. One statistician I consulted with argues that the Bayesian model below does not make sense because `X` only varies across subjects, but it does not vary across the `Condition` levels:

`fm <- stan_lmer(Value ~ 1 + X + (1 | Subject) + (1+X | Condition), data=dat)`

When I run the above model, I do notice that the standard error for the posterior distribution of `X` is virtually the same across all the `Condition` levels, which is probably not surprising because `X` is a constant for each `Subject`.

So, my main concern is this: is `fm` a legitimate model that can be formulated to obtain the posterior distribution for the effect of X at each `Condition` level, even though `X` (e.g., age) is a constant for each `Subject` across the `Condition` levels (and the posterior standard error seems to be the same)?

However, you might also try Value ~ -1 + X * condition + (1 | Subject), data = dat, QR = TRUE if you are not interested in making inferences about conditions that are not present in your dataset.

Another suggestion is to run the following model

`fm2 <- stan_lmer(Value ~ 0 + Condition + X:Condition + (1 | Subject), data=dat)`

and obtain the posterior distribution of X at each `Condition` level.

I prefer the original model `fm` because of partial pooling involved. Any reasons I should choose the second model `fm2` instead of `fm`?

I don’t see why it would be inconceivable that the coefficient on `X` could vary according to the level of `Condition` even if `X` does not vary. It is more than a bit weird to estimate a model like `lmer(Value ~ 1 + X + (1 | Subject) + (1+X | Condition), data=dat)` via (restricted) maximum likelihood because that requires that the levels of `Condition` be a random sample from some population of `Condition`s. From a Bayesian perspective `stan_lmer(Value ~ 1 + X + (1 | Subject) + (1+X | Condition), data = dat` and `stan_lmer(Value ~ 0 + Condition + X:Condition + (1 | Subject), data = dat)` are pretty much the same model (especially if you use sum-to-zero contrasts) except the former estimates the standard deviation in the coefficient on `X` across levels of `Condition` and the latter does not.

2 Likes