Model won't converge under its nested (embedded) formulation

SuLin · August 17, 2023, 11:22am

I am trying to fit a lapsing psychometric mixture model for binary responses under two formulations but while one converges the other does not even though they are both the same model.

Model
Participant responses are predicted by the scaled VOT (VOT_gs), exposure condition (Condition.Exposure), and exposure block (Block) and its interactions. By participant random intercepts and slopes, and by-item random intercepts and slopes were included in the model. An intercept was fit for the lapse rate (theta1). The same model is reformulated with slopes nested in each block and condition so that I can easily obtain each block’s and condition’s intercept and slope estimates (see code that follows).

I’ve used the recommended weakly regularising priors for all predictors (student_t(3, 0, 2.5)) except for VOT_gs where I’ve widened the standard deviation from 2.5 to 10 – the SD prior was increased because 2.5 was insufficient to achieve convergence, and posterior plots showed a slight skew in the distribution of VOT_gs . This has enabled the model to converge under the standard formulation but I get 3 divergent transitions under the nested (embedded) formulation. I have adjusted adapt_delta to as high as 0.999 but it hasn’t helped. I wonder if anyone here might have a better suggestion. The data, R script and the fitted model object of both formulations can be accessed here: Dropbox - troubleshooting_brms

I am a novice wrt modelling and have not posted in a forum like this before but I hope this provides enough clarity of the problem. I’d appreciate any advice. Thank you.

Data prep and model formulation

# load required libraries
library(tidyverse)
library(brms)
library(magrittr)
library(MASS)

# global settings
chains <- 4
options(
  width = 1000,
  mc.cores = min(chains, parallel::detectCores()))


# prepare variables for modelling
prepVars <- function(d, levels.Condition = NULL, contrast_type) {
  d %<>%
    drop_na(Condition.Exposure, Phase, Block, Item.MinimalPair, ParticipantID, Item.VOT, Response)
  
  print(paste("VOT mean:", signif(mean(d$Item.VOT, na.rm = T))))
  print(paste("VOT sd:", signif(sd(d$Item.VOT, na.rm = T))))
  
  d %<>% 
    ungroup() %>% 
    mutate(
      Block_n = as.numeric(as.character(Block)),  
      across(c(Condition.Exposure, Block, Item.MinimalPair), factor),
      
      Condition.Exposure = factor(Condition.Exposure, levels = levels.Condition)) %>% 
    
    drop_na(Block, Response, Item.VOT) %>% 
    mutate(VOT_gs = (Item.VOT - mean(Item.VOT, na.rm = TRUE)) / (2 * sd(Item.VOT, na.rm = TRUE))) %>% 
    droplevels()
  
  print(paste("mean VOT is", mean(d$Item.VOT), "and SD is", sd(d$Item.VOT)))
  
  contrasts(d$Condition.Exposure) = cbind("_Shift10 vs. Shift0" = c(-2/3, 1/3, 1/3),
                                          "_Shift40 vs. Shift10" = c(-1/3,-1/3, 2/3))
  require(MASS)
  if (all(d$Phase == "test") & n_distinct(d$Block) > 1 & contrast_type == "difference") {
    contrasts(d$Block) <- fractions(contr.sdif(6))  
    dimnames(contrasts(d$Block))[[2]] <- c("_Test2 vs. Test1", "_Test3 vs. Test2", "_Test4 vs. Test3", "_Test5 vs. Test4", "_Test6 vs. Test5")
    
    print(contrasts(d$Condition.Exposure))
    print(contrasts(d$Block)) } else if (all(d$Phase == "test") & n_distinct(d$Block) > 1 & contrast_type == "helmert"){
      contrasts(d$Block) <- cbind("_Test2 vs. Test1" = c(-1/2, 1/2, 0, 0, 0, 0),
                                  "_Test3 vs. Test2_1" = c(-1/3, -1/3, 2/3, 0, 0, 0), 
                                  "Test4 vs. Test3_2_1" = c(-1/4, -1/4, -1/4, 3/4, 0, 0), 
                                  "_Test5 vs. Test4_3_2_1" = c(-1/5, -1/5, -1/5, -1/5, 4/5, 0), 
                                  "_Test6 vs. Test5_4_3_2_1" = c(-1/6, -1/6, -1/6, -1/6, -1/6, 5/6))
      print(contrasts(d$Condition.Exposure))
      print(contrasts(d$Block))} else {
        print(contrasts(d$Condition.Exposure))
      }
  
  return(d)
}


# fit model function
fit_model <- function(data, phase, formulation = "standard", priorSD = 2.5, adapt_delta = .99) {
  require(tidyverse)
  require(magrittr)
  require(brms)
  
  levels_Condition.Exposure <- c("Shift0", "Shift10", "Shift40")
  contrast_type <- "difference"
  chains = 4

  data %<>% 
    filter(Phase == phase & Item.Labeled == F) %>% 
    prepVars(levels.Condition = levels_Condition.Exposure, contrast_type = contrast_type)
  
# overwrite prior for slope estimates
  prior_overwrite <- if (phase == "exposure" & formulation == "nested_slope") {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x6:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x6:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x6:VOT_gs", dpar = "mu2"))
  } else if (phase == "test" & formulation == "nested_slope") {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x9:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x9:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x9:VOT_gs", dpar = "mu2"))
  } else {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "VOT_gs", dpar = "mu2"))
  }
  
  my_priors <- 
      c(
        prior(student_t(3, 0, 2.5), class = "b", dpar = "mu2"),
        prior_overwrite,
        prior(cauchy(0, 2.5), class = "sd", dpar = "mu2"), 
        prior(lkj(1), class = "cor"))
  
  brm(
    formula = if (formulation == "nested_slope") {
      bf(Response.Voiceless ~ 1, 
         mu1 ~ 0 + offset(0),
         mu2 ~ 0 + I(paste(Condition.Exposure, Block, sep = "x")) / VOT_gs + 
           (0 + Block / VOT_gs | ParticipantID) + 
           (0 + I(paste(Condition.Exposure, Block, sep = "x")) / VOT_gs | Item.MinimalPair),
         theta1 ~ 1)
    } else {
      bf(Response.Voiceless ~ 1,
         mu1 ~ 0 + offset(0),
         mu2 ~ 1 + VOT_gs * Condition.Exposure * Block + (1 + VOT_gs * Block | ParticipantID) + (1 + VOT_gs * Condition.Exposure * Block | Item.MinimalPair),
         theta1 ~ 1)},
    data = data,
    prior = my_priors,
    cores = 4,
    chains = chains,
    init = 0,
    iter = 4000, 
    warmup = 2000, 
    family = mixture(bernoulli("logit"), bernoulli("logit"), order = F),
    control = list(adapt_delta = adapt_delta),
    file = paste0("models/", phase, "-", formulation, "-priorSD", priorSD, "-", adapt_delta, ".rds")
  )
}

Fitting with standard formulation

d.for_analysis <- read_csv(file = "d.for_analysis.csv", show_col_types = F) %>% 
  mutate(Response.Voiceless = ifelse(Response.Voicing == "voiceless", 1, 0))

fit_model(data = d.for_analysis,
          phase = "exposure",
          formulation = "standard",
          priorSD = 10, 
          adapt_delta = .999)

Fitting with nested formulation

d.for_analysis <- read_csv(file = "d.for_analysis.csv", show_col_types = F) %>% 
  mutate(Response.Voiceless = ifelse(Response.Voicing == "voiceless", 1, 0))

fit_model(data = d.for_analysis,
          phase = "exposure",
          formulation = "nested_slope",
          priorSD = 10, 
          adapt_delta = .999)

jd_c · August 18, 2023, 5:32pm

Howdy!

I had never seen “/” used before for interactions in regression formula for the formula of fixed or random effects. I had only encountered it for use in nested group terms in the brms formula (like as in (1|country/state). But I think that in a regression formula in brms, “/” and “*” do not do the same thing, so that your model formulas are not the same. You can check this by using the example kidney dataset that comes with brms and comparing the value of K in the Stan data called for the Stan code behind brms, where K is the number of population-level effects:

make_standata(age ~ 0 + time * disease, data = kidney)$K
make_standata(age ~ 0 + time / disease, data = kidney)$K

The first line has K=8 and the second has K=5.
Upon inspecting the full Stan data created for each use case, using the “/” notation is equivalent to writing out:

0 + time + time:disease

And using the “*” notation is equivalent to writing out:

0 + time + disease + time:disease

So if I am correctly following your code, your two models that you call “nested” and “standard” are not the same formula.

tiflo · August 19, 2023, 11:28am

Hi @jd_c,

that mismatch is the result of time being a continuous predictor. “A / B” nests effect B under effect A. I believe that, if you do the same but for a categorical predictor as the nesting variable, the two models should be prediction equivalent (as in @SuLin’s example, where I(paste(Condition.Exposure, Block, sep = "x")) is a categorical predictor). For example:

make_standata(age ~ 0 + disease / time, data = kidney)$K
make_standata(age ~ 0 + disease / time, data = kidney)$K

both have K = 8.

jd_c · August 21, 2023, 1:48pm

Gotcha. Good catch.
@SuLin have you double checked to make sure that the Stan data and Stan code are the same for both models? That would probably be the first thing that I would do.

Topic		Replies	Views
Convergence issues with brms mixture models Modeling brms	6	1803	May 18, 2022
Properly fitting a model with nested effects brms fitting-issues , specification	2	759	May 4, 2020
Convergence issues with gamma mixture models in brms: intercepts keep "swapping" Modeling mixture , brms	6	1046	April 27, 2022
Divergent transitions after warmup using brms Modeling fitting-issues	3	4519	February 18, 2019
Example growth mixture model won't converge brms fitting-issues	20	1985	October 29, 2020

Model won't converge under its nested (embedded) formulation

Related topics