Model won't converge under its nested (embedded) formulation

I am trying to fit a lapsing psychometric mixture model for binary responses under two formulations but while one converges the other does not even though they are both the same model.

Model
Participant responses are predicted by the scaled VOT (VOT_gs), exposure condition (Condition.Exposure), and exposure block (Block) and its interactions. By participant random intercepts and slopes, and by-item random intercepts and slopes were included in the model. An intercept was fit for the lapse rate (theta1). The same model is reformulated with slopes nested in each block and condition so that I can easily obtain each block’s and condition’s intercept and slope estimates (see code that follows).

I’ve used the recommended weakly regularising priors for all predictors (student_t(3, 0, 2.5)) except for VOT_gs where I’ve widened the standard deviation from 2.5 to 10 – the SD prior was increased because 2.5 was insufficient to achieve convergence, and posterior plots showed a slight skew in the distribution of VOT_gs . This has enabled the model to converge under the standard formulation but I get 3 divergent transitions under the nested (embedded) formulation. I have adjusted adapt_delta to as high as 0.999 but it hasn’t helped. I wonder if anyone here might have a better suggestion. The data, R script and the fitted model object of both formulations can be accessed here: Dropbox - troubleshooting_brms

I am a novice wrt modelling and have not posted in a forum like this before but I hope this provides enough clarity of the problem. I’d appreciate any advice. Thank you.

Data prep and model formulation

# load required libraries
library(tidyverse)
library(brms)
library(magrittr)
library(MASS)

# global settings
chains <- 4
options(
  width = 1000,
  mc.cores = min(chains, parallel::detectCores()))


# prepare variables for modelling
prepVars <- function(d, levels.Condition = NULL, contrast_type) {
  d %<>%
    drop_na(Condition.Exposure, Phase, Block, Item.MinimalPair, ParticipantID, Item.VOT, Response)
  
  print(paste("VOT mean:", signif(mean(d$Item.VOT, na.rm = T))))
  print(paste("VOT sd:", signif(sd(d$Item.VOT, na.rm = T))))
  
  d %<>% 
    ungroup() %>% 
    mutate(
      Block_n = as.numeric(as.character(Block)),  
      across(c(Condition.Exposure, Block, Item.MinimalPair), factor),
      
      Condition.Exposure = factor(Condition.Exposure, levels = levels.Condition)) %>% 
    
    drop_na(Block, Response, Item.VOT) %>% 
    mutate(VOT_gs = (Item.VOT - mean(Item.VOT, na.rm = TRUE)) / (2 * sd(Item.VOT, na.rm = TRUE))) %>% 
    droplevels()
  
  print(paste("mean VOT is", mean(d$Item.VOT), "and SD is", sd(d$Item.VOT)))
  
  contrasts(d$Condition.Exposure) = cbind("_Shift10 vs. Shift0" = c(-2/3, 1/3, 1/3),
                                          "_Shift40 vs. Shift10" = c(-1/3,-1/3, 2/3))
  require(MASS)
  if (all(d$Phase == "test") & n_distinct(d$Block) > 1 & contrast_type == "difference") {
    contrasts(d$Block) <- fractions(contr.sdif(6))  
    dimnames(contrasts(d$Block))[[2]] <- c("_Test2 vs. Test1", "_Test3 vs. Test2", "_Test4 vs. Test3", "_Test5 vs. Test4", "_Test6 vs. Test5")
    
    print(contrasts(d$Condition.Exposure))
    print(contrasts(d$Block)) } else if (all(d$Phase == "test") & n_distinct(d$Block) > 1 & contrast_type == "helmert"){
      contrasts(d$Block) <- cbind("_Test2 vs. Test1" = c(-1/2, 1/2, 0, 0, 0, 0),
                                  "_Test3 vs. Test2_1" = c(-1/3, -1/3, 2/3, 0, 0, 0), 
                                  "Test4 vs. Test3_2_1" = c(-1/4, -1/4, -1/4, 3/4, 0, 0), 
                                  "_Test5 vs. Test4_3_2_1" = c(-1/5, -1/5, -1/5, -1/5, 4/5, 0), 
                                  "_Test6 vs. Test5_4_3_2_1" = c(-1/6, -1/6, -1/6, -1/6, -1/6, 5/6))
      print(contrasts(d$Condition.Exposure))
      print(contrasts(d$Block))} else {
        print(contrasts(d$Condition.Exposure))
      }
  
  return(d)
}


# fit model function
fit_model <- function(data, phase, formulation = "standard", priorSD = 2.5, adapt_delta = .99) {
  require(tidyverse)
  require(magrittr)
  require(brms)
  
  levels_Condition.Exposure <- c("Shift0", "Shift10", "Shift40")
  contrast_type <- "difference"
  chains = 4

  data %<>% 
    filter(Phase == phase & Item.Labeled == F) %>% 
    prepVars(levels.Condition = levels_Condition.Exposure, contrast_type = contrast_type)
  
# overwrite prior for slope estimates
  prior_overwrite <- if (phase == "exposure" & formulation == "nested_slope") {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x6:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x6:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x2:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x4:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x6:VOT_gs", dpar = "mu2"))
  } else if (phase == "test" & formulation == "nested_slope") {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift0x9:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift10x9:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x1:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x3:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x5:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x7:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x8:VOT_gs", dpar = "mu2"),
      set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "IpasteCondition.ExposureBlocksepEQxShift40x9:VOT_gs", dpar = "mu2"))
  } else {
    c(set_prior(paste0("student_t(3, 0, ", priorSD, ")"), coef = "VOT_gs", dpar = "mu2"))
  }
  
  my_priors <- 
      c(
        prior(student_t(3, 0, 2.5), class = "b", dpar = "mu2"),
        prior_overwrite,
        prior(cauchy(0, 2.5), class = "sd", dpar = "mu2"), 
        prior(lkj(1), class = "cor"))
  
  brm(
    formula = if (formulation == "nested_slope") {
      bf(Response.Voiceless ~ 1, 
         mu1 ~ 0 + offset(0),
         mu2 ~ 0 + I(paste(Condition.Exposure, Block, sep = "x")) / VOT_gs + 
           (0 + Block / VOT_gs | ParticipantID) + 
           (0 + I(paste(Condition.Exposure, Block, sep = "x")) / VOT_gs | Item.MinimalPair),
         theta1 ~ 1)
    } else {
      bf(Response.Voiceless ~ 1,
         mu1 ~ 0 + offset(0),
         mu2 ~ 1 + VOT_gs * Condition.Exposure * Block + (1 + VOT_gs * Block | ParticipantID) + (1 + VOT_gs * Condition.Exposure * Block | Item.MinimalPair),
         theta1 ~ 1)},
    data = data,
    prior = my_priors,
    cores = 4,
    chains = chains,
    init = 0,
    iter = 4000, 
    warmup = 2000, 
    family = mixture(bernoulli("logit"), bernoulli("logit"), order = F),
    control = list(adapt_delta = adapt_delta),
    file = paste0("models/", phase, "-", formulation, "-priorSD", priorSD, "-", adapt_delta, ".rds")
  )
}

Fitting with standard formulation

d.for_analysis <- read_csv(file = "d.for_analysis.csv", show_col_types = F) %>% 
  mutate(Response.Voiceless = ifelse(Response.Voicing == "voiceless", 1, 0))

fit_model(data = d.for_analysis,
          phase = "exposure",
          formulation = "standard",
          priorSD = 10, 
          adapt_delta = .999)

Fitting with nested formulation

d.for_analysis <- read_csv(file = "d.for_analysis.csv", show_col_types = F) %>% 
  mutate(Response.Voiceless = ifelse(Response.Voicing == "voiceless", 1, 0))

fit_model(data = d.for_analysis,
          phase = "exposure",
          formulation = "nested_slope",
          priorSD = 10, 
          adapt_delta = .999)
1 Like

Howdy!

I had never seen “/” used before for interactions in regression formula for the formula of fixed or random effects. I had only encountered it for use in nested group terms in the brms formula (like as in (1|country/state). But I think that in a regression formula in brms, “/” and “*” do not do the same thing, so that your model formulas are not the same. You can check this by using the example kidney dataset that comes with brms and comparing the value of K in the Stan data called for the Stan code behind brms, where K is the number of population-level effects:

make_standata(age ~ 0 + time * disease, data = kidney)$K
make_standata(age ~ 0 + time / disease, data = kidney)$K

The first line has K=8 and the second has K=5.
Upon inspecting the full Stan data created for each use case, using the “/” notation is equivalent to writing out:

0 + time + time:disease

And using the “*” notation is equivalent to writing out:

0 + time + disease + time:disease

So if I am correctly following your code, your two models that you call “nested” and “standard” are not the same formula.

2 Likes

Hi @jd_c,

that mismatch is the result of time being a continuous predictor. “A / B” nests effect B under effect A. I believe that, if you do the same but for a categorical predictor as the nesting variable, the two models should be prediction equivalent (as in @SuLin’s example, where I(paste(Condition.Exposure, Block, sep = "x")) is a categorical predictor). For example:

make_standata(age ~ 0 + disease / time, data = kidney)$K
make_standata(age ~ 0 + disease / time, data = kidney)$K

both have K = 8.

2 Likes

Gotcha. Good catch.
@SuLin have you double checked to make sure that the Stan data and Stan code are the same for both models? That would probably be the first thing that I would do.