Improving mixed effects model specification: complexity metrics

Hi everyone,

BRMS novice here – neuroscience PhD student investigating oral ketamine treatment for mental health and EEG-derived information theoretic metrics as prognostic biomarkers.

I’m hoping to get some guidance on my mixed effects model specification and validation for EEG-derived metrics of signal complexity. I’m about two-months into the self-teaching journey and my brain is teetering on mental anguish haha! Foolishly, I left asking for advice this long.

Bit of relevant background: 6-week treatment of low-dose ketamine given to participants with suicidality. Participants were classified as responders or non-responders at post-treatment and follow-up. EEG collected in two conditions (eyes closed and open) across three timepoints (baseline, 6-weeks (post-treatment), and 10-weeks (follow-up)). For each task and timepoint, two complexity (signal irregularity/diversity) metrics were obtained; Lempel-Ziv Complexity and Multiscale entropy (MSE). My Lempel-Ziv BRMS model is straight forward, however, the unique (and difficult) thing with MSE is it has a scale factor; effectively coarse graining the original time series to obtain (sample) entropy estimates that (supposedly) index dynamics occurring across fast and slower timescales (see this paper for methodology: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.89.068102).

The Multi-Scale Entropy has lower bound of 0 but no upper (although literature values rarely exceed 2). The value range depends on the specification of tolerance; essentially threshold for evaluating signal pattern similarities.

Replicating the inclusion of scale as a fixed factor in this paper (Neural complexity EEG biomarkers of rapid and post-rapid ketamine effects in late-life treatment-resistant depression: a randomized control trial | Neuropsychopharmacology), I specified a mixed effects model with random intercepts and slopes, and interactions between the factors timepoint, task, response status and the scale factor (1-10).

  • Timepoints: ses-01 (PRE), ses-02 (POST), ses-03 (FUP)
  • Task: Eyes closed (EC) and eyes open (EO)
  • Response: Non-responder (0) and Responder (1)
  • Scale: MSE 1-10

Script for creating reproducible example in long or wide format:
mockData.R (1.9 KB)

Please note I’ve kept the iterations low as I review model selection as I originally had at 10,000 but this was making the whole process cumbersome. I understand increasing iterations and hence ESS will improve model estimates, however, I want to address model specification before progressing to this.

MSE distribution:

Formula (Note: I’m not interested in random slope correlations):

data <- # read in long-format data

model <- brm (MSE | resp_trunc(lb = 0) ~  1 + Responder * Timepoint * Task * Scale + (1 + Responder + Timepoint + Task + Scale || Subject), 
             data = data, 
             family = gaussian(),
             chains = 4,
             cores = 4,
             iter = iterations,
             warmup = warmup,
             prior = prior_random,
             init = init,
             sample_prior = 'yes',
             save_pars = save_pars(all = TRUE),
             seed = 22,
             # file = "MSE_model2.rda,
             control = control 
             )

With the following priors:

prior_random = c(prior(normal(1.2, 0.05), class = 'Intercept', lb = 0), 
                 prior(normal(0, 0.10), class = 'b'), 
                 prior(student_t(3, 0.05, 0.05), class = 'sigma'), 
                 prior(cauchy(0, 0.01), class = 'sd')
                 )

And model parameters (iterations low just for the model selection step):

iterations <- 2000      
warmup <- iterations / 2 
init <- 0
control <- list(
  adapt_engaged = TRUE,
  adapt_delta = 0.95, 
  stepsize = 0.05,
  max_treedepth = 15)

The PP_check of the prior only model shows reasonable estimates:

Chain mixing is okay (no convergence warnings).

Summary (snippet for brevity):
Rplot08

Posterior predictive checks look as follows (stat estimates for other factors similar):



Min:


Max:
Mean:

My questions:

  1. Is it statistically/logically sound to include the scale factor in the model or should the different scale values of the MSE be treated as separate DVs in a multivariate model or separate models for each scale value? My use of MSE comes from this paper Correction: Neural complexity EEG biomarkers of rapid and post-rapid ketamine effects in late-life treatment-resistant depression: a randomized control trial - PMC, which applied this approach to a frequentist LME. As I’ve iterated through the model building process, the feeling that this isn’t ‘right’ has crept in. To be honest, if a multivariate was a appropriate this would be great as inclusion of scale makes the population effects somewhat unwieldy.
  2. Whilst the posterior predictive checks indicate the MSE model fit is decent, the min and max stat predictions are somewhat off (this is consistent across factors). Is this related to my model specification, and are there ways to improve this?
  3. Would a mixture model be best if I keep scale as a population effect? I originally dabbled into the mixture model to capture the bimodal nature of the MSE values, however, the gaussian model produced the posterior distribution checks above (which I thought was reasonable).

Please scrutinise and point out any errors, omissions, or imperfections as I wish to learn!

I’ve any further info is needed let me know.

Thank you!

Hello Jules,

first of all the caveat that I personally don’t work with the MSE so excuse me if some of my questions might be completely off.

Is it a given that in the actual data you will have is normally distributed?

  1. From what I see and what would be my understanding of how you talked about the scales I assume that they are correlated, right? It might make sense to include this as information (AR or maybe a GP). In your simulated data set this correlation does not exist but might be a crucial part of information in real data (Fig.3 in Costa et al.). Similarly it could be said that the time points, depending on the interval, are correlated as well. You might have a better intuition for such a correlation though. At the moment your model is assuming independence of scales for estimating the mean but also the sd. This could definitely improve your performance and might help to get better estimates.

  2. For the min/max it is always a bit tricky to get perfect estimates for the extremes. Maybe there is an inflation of the sd which leads to such a shift (see your table).

  3. To the mixture model: this would assume that there is a bimodal outcome at every scale. One could model a change in the proportion by using the scale as a covariate to shift the proportion of mixtures. Would that be correct, though? Would you expect bimodal distributions of the same subject at a given scale?

Hi Daniel,

Thanks for the speedy reply and for your advice/questions.

  1. You are absolutely right - the MSE scales are highly correlated (apologies for not including in the simulated data). I haven’t checked whether timepoint values are correlated also, but I imagine so (it was 6-weeks between baseline and post, and 4-weeks from post to follow-up).

My intuition (not grounded in stats) on the GP or AR approach is that it might not be appropriate as whilst the EEG activity is a (quasi-)stochastic process, the MSE scales aren’t indexing changes over time. In saying that, I am not versed in this area and hence my intuitions may be moot.

Regardless, I briefly ventured into the GP implementation last night and learnt the following:

  1. Predictors of Gaussian processes should be numeric vectors
  2. Syntax of the form ‘Responder:gp(Scale)’ is invalid in brms syntax.

If I pursued this, I’m unsure how scale could then be interpreted as a numeric vector as opposed to a factor, especially with respect to levels of the other factors. Also, would I have to do away with the scale interactions with other factors or is there a specific way of specifying the syntax I need to follow? I’m sorry, I’m quite inexperienced in this regard.

Could the correlations be accounted for with a multivariate model of the following form (I removed random slopes for simplicity currently)?

bform1 <- 
  bf(mvbind(MSE..scale.1., MSE..scale.2., .... MSE..scale.10) ~ Timepoint * Task * Responder + (1|p|Subject)) +
  set_rescor(TRUE)

MVmodel <- brm(bform1, data = data, chains = 2, cores = 2)

From reading this vignette (Estimating Multivariate Models with brms), this seemed a possible approach - again, my inexperience may be leading me astray here.

In doing so, my posterior pp_check’s had better fitting (scale 1 and 10 as examples; others similar).


The (snipped) output was as follows; I note the residual correlations are quite high.



Am I barking up the wrong tree here? Sorry for the information dump. I seriously appreciate your help.

  1. Thanks for pointing out the SD inflation, I missed this. I will circle back to this once I nail down the model.

  2. You are absolutely right, I would not expect bimodal distributions at a given scale (I was incorrectly thinking about the overall distribution). In fact, at all scale levels the distribution is approximately gaussian (a few with a slight skew). Going back to the multivariate model suggestion, if this were the right option, I figured the various MSE levels could be modelled with the appropriate family as in the vignette above.

Cheers,

Jules

From what I read about how the MSE is computed (and that really just happend today) it seems to construct these scales based on the samples in time (I just looked at Humeau-Heurtier, 2015 and a quick summary of the method here). So in my opinion (and that’s what it is and not more) the scales are artificially factorized. I would argue they are an extension of the time domain (in a weird way) which is dependent on each other. I have to admit that I don’t completely understand the interpretation of scales given that they depend on the sampling rate which might not be the same across experiments/labs. No units and somehow similar to the application of a filter. Coming from analysing local field potentials, whole cell recordings and spike data we like to use wavelet transforms, hilbert transforms and cross correlations or decompositions. Having no frequency/magnitude domain at all seems a bit foreign and weird :) Not to say it is not valid!

GPs don’t need to be related to time, as also shown in the “Statistical Rethinking” lecture series by McElreath (very strong recommendation from my side). There he uses it in a model to use distances between islands for estimating tool occurrence/development. All to show that a GP doesn’t care about the underlying variable. It just needs some spacing (no necessarily equal) and enough data points (can struggle with too many but there are solutions e.g. by Aki Vehtari’s group).

One could also use the scales as hierarchical factor with slope correlations meaning that scales can be correlated to each other. This approach will allow again the use of truncated distributions. I might be wrong but I think truncated multivariate gaussian might not be implemented in brms (@paul.buerkner discussed that in a previous post here on the discourse).

Again interpretation of scales is still confusing to me but as I mentioned before you could use them as correlated slopes. Hmm I think I use brms too rarely (I use stan directly) to give advice there.

Is there an established way of handling such data sets? I assume you get them with the scales already, right? It is still fair to say that previous analysis pipelines can be improved and a Bayesian approach offers new possibilities.

I hadn’t previously thought of the course-graining in that away (as artificially factorised), but this makes sense. You are not alone in not completely understanding the MSE interpretation; whilst it has been claimed the lower scales reflect fast and higher scales slower frequencies, the issues with specifying tolerance, number of timepoints used, and the effective low-pass filtering with coarse-graining have been raised (Standard multiscale entropy reflects neural dynamics at mismatched temporal scales: What’s signal irregularity got to do with it?) and I agree with the criticisms. Whilst I included in my paper to validate previous findings, these limitations certainly are going to weigh heavy on my interpretations (or lack thereof).

In case it is of interest, the application of information theoretic metrics to EEG/MEG time-series is ‘hot’ right now in the ketamine/psychedelics/consciousness space (Good review here: https://onlinelibrary.wiley.com/doi/full/10.1111/ejn.15800). I’ve seen some work in the multi-electrode space - Entropy | Free Full-Text | Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition.

It terms of the issue at hand, it seems I have a ways to go before I can put to bed haha! What is a Phd for if not serious self-questioning? I really appreciate your advice/recommendations; I will dive into the Statistical Rethinking resources, use of scales as a hierarchical factor with slope correlations and see where this takes me.

Previous papers I’ve come across simply incorporate scale as a population (fixed effect) in frequentist lme4 models, with no consideration of correlation between levels… so no help here. Hopefully I can make some headway.

I’ll update on where I get to in the coming weeks. Thanks again!

1 Like