ANOVA - random effects on intercept and/or slope?

Hi all,

the following seemed like a trivial problem to me, but now I’m not sure anymore: I want to code an ANOVA in BRMS. This is a study with 2 groups of participants, and they each get measured at baseline and then after an intervention (in one group). Some participants miss data for the second measurement.

If I had no missing data, I would have coded the model non-hierarchically - where Ydiff is the difference between the two measurements:

brm(Ydiff ~ 1+ group)

And if I had had more than 2 measurements per person, I would have coded it to allow for individual differences in the intercept and also in the effect of measurement (i.e. ‘time’):

brm(Y ~ 1+ time*group + (1+time | subject))

However, now I’m unsure what to do for the ANOVA case (i.e. exactly two measurements), should I not have the random slopes and/or no random intercept?

[1] brm(Y~ 1+ time*group,data=myData)
[2] brm(Y~ 1+ time*group+(1 | subject),data=myData)
[3] brm(Y~ 1+ time*group+(1+time | subject),data=myData)
[4] brm(Y~ 1+ time*group+(-1+time | subject),data=myData) //i.e. only time varies

In a normal linear model, I can only have either intercept or the slope (time) as varying between subjects, not both. But in BMRS I can fit a model where both are varying. Results are:

  • model with intercept and slope varying gives results for group * time that are very similar
  • model with only slope varying gives similar results, but much larger credible intervals.

I guess now that I’ve written down all options, it seems like a random decision whether I fix intercept only or slope (time) only - even though asking google it seems that the most common would be to have a random intercept (though then I was not sure it’s just because non-Bayesian models don’t converge otherwise). But also it seems maybe over-parameterized to allow both intercept and slope to vary given that for each person I only have 1 or 2 measurements…

I would be very grateful for any pointers, also to papers/tutorials that explain this. I’ve tried to look this up, but not gotten very far.

Many thanks


I don’t know if it’s possible to do this in brms, but I worked out how to do this in raw Stan:

1 Like

Oh, re-reading your post, I realize you’re actually asking a more conceptual question on how complex to make the model. As far as I understand, if you only have one observation per subject per within-subject condition, then you can’t estimate measurement error as something separate from the across-Subjects variability in whatever you specify as random effects. That is, a non-hierarchical model (as shown in the video/code I linked) will estimate a mean effect of time as well as a variance in how that effect of time will manifest from participant to participant, and it will also estimate a mean intercept as well as a variance in how that intercept varies from person to person. When you have more data per person such that there are multiple measurements per condition, a hierarchical model will additionally estimate measurement noise, possibly as something that can vary from person-to-person and/or condition-to-condition. I don’t actually know brms well, so I don’t know if it might try to automatically detect if it has enough data to try estimating measurement noise, but if it doesn’t do this and you feed it data where you don’t have enough, it might still try to give you results that will be nonsensical. You could check by looking at the parameters brms estimates; if there’s a measurement noise parameter (usually called sigma, though I don’t know if brms follows that convention), then there’s trouble. Unless you actually have good prior information on the magnitude of measurement noise, in which case you probably do want a full hierarchical model so long as you include that information in your prior on the measurement noise parameter.

1 Like

Thanks for your thoughts on this!

From your logic, I would have thought it would imply that I should fit a model with fixed effects only/ non-hierarchical. However, from the results I get, I am unsure this is right here.

My data is organised like this:
rows = measurements (some people have 2, others only 1 (always the first one, i.e. time=1))
columns: y, time (1 or 2) and group (1 of 2)
and the models I tried either only had ‘fixed’/‘constant’ effects or also had some or all possible ‘random’/‘varying’ effects.

To compare the answers that I get I have also done a standard t-test, for which I computed for each person y for time 2 minus y for time 1 and left out the people with only the first measurement. This is highly significant, so I would think that the ‘correct’ regression models should also show me the same…

[1] Y ~ 1+ time + group + group x time [with not random effects]
The effect of group x time is estimated to be between -10.8 and 0.5
-> Very different from the t-test. So this suggests to me that even though each person does at most 2 conditions, I need to add some random effects.

[2] Y ~ 1+ time + group + group x time + (1|subject) [i.e. random effects of intercept only]
This estimates the group x time effect to be between -9.5 and -1.3. So that (in terms of being ‘significant’) is more like what I would have expected. I also get an estimate for ‘sd(Intercept)’ = 6

[3] Y ~ 1+ time + group + group x time + (1+time|subject) [i.e. random effects of intercept and slope]
This model was hard to fit (needed many samples and high adapt_delta). The value of the group x time effect is about the same: -9.5 to -1.2. And I now have also variances: sd(Intercept)=7 and sd(time) = 1.9

[4] Y ~ 1+ time + group + group x time + (-1+time|subject) [i.e. random effects of slope only]
The value of group x time now has again a much wider credible interval: -10 to -0.02 (though same mean). And I get only sd(time)-2.8
-> Choosing either to make the intercept or the slope random seems like an arbitrary choice to me, so I am not happy that this gives very different results. And there is hopefully something I’m misunderstanding about the model. The only thing I can think of is that each person has an intercept, but the ‘time’ regressor exists as only 1 value for some people. But then I’m not sure why the model that has both as random effects ‘works’ (i.e. gives the same results for the factor of interest as the model with only varying intercept)

1 Like

Yes. But I don’t think any of the formulae you wrote express that. Let’s call @paul.buerkner in here and ask: How would you express a model in brms that one would otherwise express with stats::aov as outcome ~ group*time + Error(subject/time). I suspect that the way Jacquie is expressing it lmer-style as outcome ~ group*time + (1|subject) is erroneously (for this kind of data) attempting to model measurement noise when her data don’t support its estimation, right?

Oh, and to address one of your comments:

Yes, this is expected because this model does not “see” that each person contributes (if no missing data) two time points, so it is treating your data as a completely between-Ss design and thereby losing power. You need to inform the model that it has two datapoints from each person, permitting it to compute an intercept and a time effect per person. You do not want to express the model as hierarchical, however, as this would attempt to additionally estimate measurement noise, making the model unidentifiable from the data. If you try to use lmer on the data, for example, it will fail outright because of this non-identifiability. brms will seem to “succeed” only thanks to whatever default priors it is using for the measurement noise term. Unless you have good prior information to use instead of those defaults, you want to figure out how to tell brms that the data are not hierarchical and it shouldn’t attempt inference on measurement noise. The video I linked shows how to do this without brms; if you go a couple lectures earlier you’ll get the full set up of the problem, and some of the later lectures talk about the extension to truly hierarchical data.

What model parameters are implied by Error(subject/time)?

A mean and across-subjects variance for the intercept and a mean and across-subjects variance for the effect of time. aov will impose the assumption of zero covariance between those two variances, but the model from my lectures relaxes this and estimates the covariance. Critically, though, the aov formulation does not estimate measurement noise.

Okay. (1 | Subject/time) will estimate a variance for Subject and a variance for time:Subject.


To put things more concretely, and ignoring the relatively trivial complication of the between-groups variable, the data are Y_{s,c} where s is a subject index and c is a condition index. Assuming X is a vector of sum contrasts encoding the condition level for each Y as -.5 or +.5, I propose they model it as:

Y_{s,c} = intercept_s + effect_s*X

Where intercept_s and effect_s are samples from a multivariate normal:

[intercept_s,effect_s] = MVN(mu,sigma)

Such that mu is a vector of length 2 conveying the mean intercept and mean time effect and sigma is the covariance matrix.

Iff OP had more than one observation per condition, they would then have Y_{s,c,r} where r indexes repetition number, permitting modelling the data as

Y_{s,c,r} = normal( intercept_s + effect_s*X , noise)

Where intercept_s and effect_s are estimated as before but we additionally estimate the measurement noise parameter noise

So that could also be a multivariate model with a residual correlation too?

m <- brm(mvbind(Time1, Time2) ~ 1, data = mv_demo)
#>  Family: MV(gaussian, gaussian) 
#>   Links: mu = identity; sigma = identity
#>          mu = identity; sigma = identity 
#> Formula: Time1 ~ 1 
#>          Time2 ~ 1 
#>    Data: mv_demo (Number of observations: 5) 
#> Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup samples = 1000
#> Population-Level Effects: 
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Time1_Intercept    27.89      2.44    22.90    32.79 1.00      665      453
#> Time2_Intercept    40.01      2.50    35.09    44.88 1.01      448      446
#> Family Specific Parameters: 
#>             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_Time1     5.70      2.42     2.51    12.17 1.00      673      698
#> sigma_Time2     5.70      2.63     2.75    12.91 1.00      535      813
#> Residual Correlations: 
#>                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> rescor(Time1,Time2)     0.21      0.41    -0.61     0.84 1.00      487      467
#> Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
#> is a crude measure of effective sample size, and Rhat is the potential 
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Then the group or time effects would have to be based a bit of post-processing on the samples.

Ah, yes, that looks right to me, though I prefer the intercept/effect parameterization as I have a lot easier time coming up with weakly informed priors there than I do for where each condition is going to be.

@tjmahr does brms automatically detect and infer missing data?

This discussion has been very helpful!

I think, it made me realise that,

  1. Even though a regression would be the best framework for a longitudinal data set, it is not right for the case of just 2 time points (pre and post); so for longitudinal data, I would do:
Y ~ 1+ time * group + (1+time | subject)

But I cannot just take this formula and use it for just the pre-post case.

  1. Actually, I think a hierarchical model, i.e. using the data from all participants, is not useful for estimating the difference pre vs. post on the group level. I noticed this by actually writing down the Stan models in the simplest form I could think of. And doing this I realised that estimation of the baseline value does not actually affect estimation of the pre minus post value…
  int nrow_baseline;
  int nrow_diff;
  real baseline[nrow_baseline];
  real diff[nrow_diff];
  int group[nrow_diff];
  real mu_baseline;
  real mu_diff[2];
  real<lower=0> sigma_baseline;
  real<lower=0> sigma_diff;

transformed parameters{
real mu_groupDiff;
mu_groupDiff = mu_diff[1]-mu_diff[2];
  mu_baseline ~ cauchy(0,5);
  mu_diff ~ cauchy(0,5);
  sigma_baseline ~ cauchy(0,3);
  sigma_diff ~ cauchy(0,3);
  baseline ~ normal(mu_baseline,sigma_baseline);
  for (irow in 1:nrow_diff){
    diff[irow] ~ normal(mu_diff[group[irow]],sigma_diff);

I hope what I’ve just said is not wrong?

Again, thank you so much for your help with this; this has really been extremely useful for me!!!

1 Like

Indeed, this is why a repeated-measures anova of a factor with two levels will give you the same p-value as a one-sample t-test on the difference score.

However, when you have missing data, as you mentioned in your original post, you will gain benefits from modelling the raw conditions (compared to collapsing to a difference score) in that those participants missing data for one or the other condition can still help inform things.

I am now confused about participants with missing second condition help with the estimate of the difference score (given that all people who have missing data miss condition 2)? From my model it does not as the missing participants only contribute to the estimate of the baseline, not to the difference. I guess it would potentially be different if some people missed condition 1 and others condition 2?

Two conceptual things you IMHO could consider:

  1. When you have multiple defensible models, it is IMHO best to report the results from all of them (e.g. in an appendix) and discuss how sensitive are your conclusions to model choice (this still forces you to say which model is defensible, but that is IMHO easier than choosing one “best” or “correct” model). See Andrew’s paper on multiverse analysis where he discusses the concept more fully (in a frequentist setting), I tried to do something like this in a Bayesian context in a recent meta analysis (look for Part 3).
  2. If you have a within-subject design, it might be beneficial to consider the baseline value as a covariate: y_at_time_2 ~ 1 + group + y_at_time_1, according to Data Colada this is always better than using the difference as paired t-test does (this idea is somewhat hidden as an “Aside” in the middle of the post). I’ve never used this myself neither have I read about it elsewhere than on Data Colada. This moves away from estimating all the data to estimating just the change between timepoints for each individual and might thus be better use of your precious data :-)

Some other observations:

Brms will generally not automatically detect almost anything :-) But it also mostly doesn’t produce non-sensical data - if the model is not well identified, you (mostly) either get divergences or very wide posteriors (telling you that you have learned little about the data).

Hope that clarifies more than confuses! And thanks @mike-lawrence and @tjmahr for being helpful!

It doesn’t help model the difference but would help if your outcome because it provides information about the variability of whatever you are measuring. I personally would explore the data to get a handle on the variance of the measurement for different groups, (one can imagine and I think there is a vignette using treatment groups, that the control group has a different variability than the treatment group.)

I think modeling either as a MV response or as Martin suggested

The later is particularly defensible (and perhaps MOST) if there is any prior knowledge that suggests the value at baseline will affect the value post treatment. And this also allows you to use all your data. I might even estimate it y_at_time_2 ~ 1 + y_at_time_1 + (1 | group) because that formulation has been making more sense to me lately and if I’m not wrong (I could DEFINITELY be) could help provide partial pooling?

All of this becomes slightly less critical as your number of subjects increases. Although in this context we should really be trying to figure out how to build a model that replicates the data generating process and clearly other than group and baseline level we know nothing about the subjects, who may have other associated variables that are of consequence (and therefore could explain the variability between them).

As an aside there are ways you can use brms to estimate missing data in multilevel models, but in this context you would probably be asking too much of the data you have.


+1 to what @MegBallard said. Some minor expansion on the topics:

IMHO this model is not dependent on any causal assumptions. It IMHO only assumes that the two measurements of the same subject are (imperfectly) correlated.

In my (limited) experience 1 || binary_variable and just binary_variable yield very similar inferences unless your dataset is very small, but the former can make some visualisations easier. (the 1 | binary_variable requires a bit more data as it also estimates correlations which you might not be interested in - see manual for further details )

Hope that clarifies more than confuses!

I think I cannot do :

y_at_time_2 ~ 1 + group + y_at_time_1

Because the main reason I wanted to use a hierarchical model (rather than just a t-test on time 2 minus time 1) is that for some participants y_at_time_2 is missing and originally I thought that by finding a formulation that also includes y_at_time_1 in the model it would help the estimation overall. But I no longer think that’s the case (see above).

1 Like