Modeling population and group-level variance separately

Short summary of the problem

  • Operating System: Arch Linux
  • brms Version: github current as of 2021/12/25

I am analyzing data from an experiment on drumming precision. Each of the 20 subjects performed 75 trials in 3 sessions, and we also recorded a measure of musicality for each subject. The response variable is the timing of subject’s drumming relative to the correct timing (with negative values indicating too early responses and positive values too late responses).

A small subset of my sample data is attached.
data_subset.csv (4.9 KB)

Now, I want to model both location and scale of the response variable “deviation”.
Concerning the scale, I am interested in both within-subject and between-subject variance at the population level.

The statistical model is described here:

Regressors for the within-subject variance indicate how much the individual variance depends on a regressor at the population level (e.g., at the start of the experiment [trial=0] subjects might perform more consistently at the individual level than at the end [trial=75]—an effect possibly attributable to fatigue).
In the above cited model, the within-subject variance is the variance of the residuals, and—as far as I understand—these can be modeled with the sigma term of brms.

Regressors for the between-subject variance indicate how much the variance at the group level depends on a regressor at the population level (e.g., at the start of the experiment [trial=0] average performance of subjects has larger variance while at the end of the experiment[trial=75] subjects perform very similarly—an effect possibly attributable to training effects in unmusical subjects).
In the above cited model, the between-subject variance is modeled as the standard deviation of the group-level intercept. I cannot figure out, if and how this could be done.

Thank you very much in advance for your help!

model <- (
    # model the intercept including a group-level intercept
        bf(deviation ~ 1 + musicality + trial + session + (1|subject)) +
    # model the within-subject variance term and a group-level intercept
        lf(sigma ~ 1 + musicality + trial + session + (1|subject))
    # now, can I also model the variance of the group-level intercept with a population level variable?
    # lf(sigma ~ 1 + musicality + trial + session + (1 + musicality2 |subject)
    # gives a musicality2 variable at the group level while I want it at the population level
    # governing the variance of the group-level intercept.
    )

I don’t have time to read the linked article, but I will take a stab at answering your question. @Solomon may have better answers, as he wrote this helpful section that deals with similar models here.
Let’s say you take your example data and run the following 2 models:

library(brms)

dat = read.csv(file.choose(''))
dat$subject <- factor(dat$subject)
str(dat)

model1 <- brm(
        deviation ~ 1 + musicality + trial + session + (1|subject)
	data=dat, family=gaussian, cores=4
    )


model2 <- brm(
        bf(deviation ~ 1 + musicality + trial + session + (1|i|subject),
           sigma ~ 1 + musicality + trial + session + (1|i|subject)),
	data=dat, family=gaussian, cores=4
    )

(First off, you would need more data and think about priors, as the second model doesn’t converge). But just using this as an example, in model1 the between-subject variability is in the sd(Intercept) term of the group-level effects. The within-subject variability is in sigma, and in this model, sigma is assumed to be the same across subjects, musicality, trials, and sessions.
Now in model2 the between-subject variability in the outcome is still in the sd(Intercept) term. However, now sigma is not assumed constant across subjects, etc. So, the mean within-subject variability changes with musicality, trials, and sessions, and each subject now has their own mean sigma via the offsets from the varying slopes for subject. In this model sd(sigma_Intercept) is the between-subject variability in the within-subject variability. And sigma_musicality for example, is the effect of 1 unit increase in musicality on the mean within-subject variability, holding other predictors constant, on the log scale (note that sigma is modeled with the log link by default in brms). The |i| part of the group-level terms in the model indicate that the location and scale is modeled as correlated for each subject.
Hopefully this is the correct interpretation! Take a look at the link I mentioned, as I think it will be helpful, and hopefully @Solomon or @martinmodrak can chime in if I’m off.

1 Like

@jd_c’s answer seems sound, to me.

Thank you so much for taking the time to help!

@jd_c’s explanations match very well with the model in the linked article, and @Solomon 's explanatory section explains the syntax very well.

Unfortunately, to me, both of your explanations seem to be in the same way as I understood them before, and I still haven’t figured out how to do what I want it to do:

@jd_c
Now in model2 the between-subject variability in the outcome is still in the sd(Intercept) term.

Yes, the between-subject variability is in sd(Intercept) but can I let sd(Intercept) vary with respect to a regressor without including “random slopes”?

The problem is that deviation ~ (1 + coef | subject) models a coef-effect on the mean that may vary across subjects (“random slope”) and sigma ~ (1 + coef|subject) models an effect of coef on the within-subjects variance that may vary across subjects.

But I would like to give a regressor for the standard deviation of the intercept at the subject level: sd(Intercept) ~ 1 + musicality + trial + session.

Do I miss something, and is there an option to specify the model in such a way?

I think I see what you are asking now. Unfortunately, I don’t think I’m any help here…
As I understand, in a multilevel model with varying intercepts, you have a prior for those intercepts (or deviation from the population-level intercept), and that ‘adaptive prior’ is assumed to be Gaussian. The mean and standard deviation (hyperparameters) of that ‘adaptive prior’ each have their own priors (hyperpriors) (which is what makes this a multilevel model). But what it seems like you are wanting is for that standard deviation of the ‘adaptive prior’, that hyperparameter, to be predicted by your population-level effects and each of those parameters would have their own hyperpriors. I’ve never thought about a model like this before…I don’t think you can implement that in brms… brms does have the syntax y ~ x + (1 | gr(subject,by = x)) where x is a factor and you estimate different hyperparameters for the varying intercept for each of the levels in x. But what you are wanting is much different. Maybe @paul.buerkner could help.

1 Like

@jd_c Thanks for your input
That’s satisfying and unfortunate at the same time. Satisfying, because I seem to have understood the syntax and capabilities quite well and unfortunate because there doesn’t seem to be an easy solution.

I wonder whether there is a dirty workaround like that:

model <- (
    bf(deviation ~ muPop + muGrp, nl=TRUE) +
    lf(muPop ~ 1 + musicality + trial + session) +
    lf(muGrp ~ 0 + (1|id|subject)) +
    nlf(sigma ~ sigmaPop + sigmaGrp*muGrp^2) +
    lf(sigmaPop ~ 1 + musicality + trial + session) +
    lf(sigmaGrp ~ 0 + musicality + trial + session)
    )

The idea is, that muGrp varies per subject but has a zero-mean. So muGrp^2 is the squared deviation from the mean. And if this is averaged/summed across subjects, this is something very similar to the variance of the group-level mean (I just don’t know whether the muGrp^2 will do it in this way…).

What do you think?

Forgive me if I am missing something here (it’s not clear to me at what level your data is – per tap, per rhythmic cue, per performance of multiple such cues?), but wouldn’t it be more straightforward to calculate the SD of asynchrony as a new column in your data, and then fit a multivariate regression to both (signed) asynchrony and SD of asynchrony (or possibly the log of SD asynchrony to allow residual correlations to be calculated)? I suspect you are overcomplicating things by attempting to formulate it in the way you are suggesting but I may be missing something.

I am unsure about what you have in mind.

Here more info about the concept of the study:

There are N=20 subjects. To these, we played a short auditory stimulus, and afterwards subjects had to keep tracking the rhythm internally. After a cue-tone they had to drum at a specific time in relation to the internally-tracked rhythm.

Subjects performed 75 of these trials in 6 sessions. For each trial, we measured the deviation of their drum-hit from the correct timing.
Additionally, for every subject, we obtained a measure of musicality.

In total, the complete sample contains about 1500 trials.

There is a rather large heterogeneity between subjects both regarding their average timing and variance of the response.

So, how could I calculate “SD of asynchrony” on a trial-wise level? I could, in principle, measure the population-level sd, the sd of the responses from individual subjects, or the average deviation from the optimal point for each subject.
The problem is, that there might be (and it appears that there is) an effect of the regressors on both location and scale - but the measurement of scale depends on the estimate of location…

Or at the session level.

I am not sure what you are referring to by “scale” here – do you just mean the SD of asynchrony? Below, I am assuming by “trial” you are referring to each row of your data, which is each cue (time when a participant should tap).

How about the following models where the mean asynchrony and SD of asynchrony are calculated over each session?:

mean_deviation ~ 1 + musicality + session + (1 | subject)
sd_deviation ~ 1 + musicality + session + (1 | subject)

or

mean_deviation ~ 1 + musicality + (1 | session) + (1 | subject)
sd_deviation ~ 1 + musicality + (1 | session) + (1 | subject)

(These two DVs could be combined into a single multivariate model.) This seems like a straightforward way to analyse the data, particularly given that sd of asynchrony (sd_deviation) is a commonly used DV in tapping studies.

But maybe there is some aspect of your design, which means this needs to be modelled at the trial (cue) level? In which case, you could model deviation at the trial-level and sd_deviation at the session-level, or model both deviation (signed asynchrony) and absolute deviation at the trial level.

Would the LMMELSM package help you? It can do mixed effects location scale models with between-group variance models. Note: I’m the package author of LMMELSM, so if you have any questions about it I can answer them.

1 Like

Dear @Stephen_Martin,
that sounds and looks (I had a short look at the documentation) very promising. I’ll come back to you in case of questions.

Dear @Stephen_Martin ,
our project evolved slowly, but we really started using the LMMELSM package and it turned out to be really helpful in fitting the model that we had in mind. So first of all, I need to say Thank You for the help and support!

LMMELSM returns meaningful results but the interpretation turns out to be somewhat difficult which lead to another question I hope you could answer.

In our model, for one regressor, we obtain the result that subjects tend to perform earlier and with lower standard deviation (negative location and negative scale coefficient). But the question arises whether that also reduces the mean absolute error which is mean(abs(deviation)) which depends on both scale and location simultaneously.

My idea would have been to use the predict function to calculate marginal effects on the transformed deviation variable mean(abs(deviation)), but I obtain the error

Error in eigen(Sigma, symmetric = TRUE) : 
  infinite or missing values in 'x'

for some subjects or for some newdata when applying the prediction despite providing a finite and complete row of data.

Could you possibly point me toward a solution?

I can - after trying to debug - tell that the error is generated in the mvrnorm function of the MASS package. It seams that the generated covariance matrix Sigma contains some infinite or NA values.

As far as I can see, if using include_error=TRUE some extra variability is added to the predicted variables, but I am not sure, whether I need this for the calculation of marginal effects.

After some more investigations, I found out that it is due to outliers in eta_logsd ranging up to 2500 giving inf for exp(eta_logsd).

Does that indicate a bad fit, bad input data, our should these values be better ignored?

Hey @gwaterst ,
sorry for just now seeing this.

I imagine that, yes, having eta_logsd up at 2500 would be a problem. Could you provide the model specification for LMMELSM here, so I can see what type of model we’re dealing with?

Dear Stephen, thanks for taking the time again to look into this. I really appreciate your help.

As a reminder, we have an experimental paradigm where subjects listen to a polyrhythm and need to tap a single beat of either of these rhythms (this is randomly cued to the subjects after a silent break). We measure the deviation to the correct timing in ms (with negative values indicating too early and positive values too late hits) and have counters for trial index and session to account for training and fatigue. Additionally, we have a questionnaire-based value for musicality of a subject and recorded EEG during the experiment.
EEG components indicating entrainment at a specific frequency (sounding a snare drum) is prefixed with the term Snare.

This said, our model is:

fit_snareAll25k <- lmmelsm(
  list(
    observed ~ deviation,
    location ~ musicality + trial + session + 
      Snare1_between + Snare2_between + 
      Snare1_within + Snare2_within ,
    scale ~ musicality + trial + session + 
      Snare1_between + Snare2_between + 
      Snare1_within + Snare2_within,
    between ~ musicality + trial + session + 
      Snare1_between + Snare2_between + 
      Snare1_within + Snare2_within),
  group = subject, data = snare_data, cores=8, iter=25000, warmup=5000,
  control = list(adapt_delta = 0.99, stepsize = 1, max_treedepth = 10))