Seeking examples of using brms to estimate treatment effects in experiment with multiple respondents per cluster

I want to simulate data and estimate the effect of a family-based intervention where multiple members of the family complete a 30-item instrument that can yield an overall score measuring family functioning. Families would be randomly assigned to treatment or wait-list control.

My preliminary search pointed me to work by Bauer et al on trifactor models and IRT. I’m seeking examples of this or competing approaches that would let me incorporate data from multiple family members in the estimation of the effect of the intervention on the family functioning score. Any tips?

Not quite what you’re looking for, but Wagenmakers et al (2016; https://doi.org/10.1177/1745691616674458) did a multisite experiment with a Likert-type DV. I have a content sketch here where I fit a few models to the data of one of the sites. If you used that as a starting point, you could fit a fuller model on the data from all the sites, which would be somewhat analogous to persons nested in families (participants nested in labs).

1 Like

Interesting problem! I’ve done work using Stan om multi-informant data using IRT models, but not with brms.

If you haven’t already, I’d recommend looking into the literature on informant discrepancies . Multi-informant assessment of family variables is not theoretically trivial. Assuming that all differences informants can be interpreted as measurement error requires some strong assumptions and justification.

There was a special issue on the topic in the Journal of Clinical Child & Adolescent Psychology last year, which might be relevant: https://www.tandfonline.com/doi/full/10.1080/15374416.2022.2158843

1 Like

Thanks @Solomon and @erognli. I appreciate the pointers. Exploring them both.

I wonder if either of you might have better ideas about how to simulate this data. I’ve been poking around with simstudy but I’m finding it to be a complex challenge.

Trying to simulate:

  • 100 families with between 2 and 6 members each
  • Families assigned 1:1 to treatment or control
  • Responses to 30 Likert-style items, 0-3
    a) Person-level: items correlated to measure one latent construct
    b) Family-level: scale scores correlated within families
    c) Group-level: scale scores higher in treatment arm

I’d like to simulate datasets that vary on the number of families and the details for a, b, and c.

Here’s my attempt at the first part to create the family structure:

library(simstudy)
library(data.table)
set.seed(8675309)

# family level
  gen.family <- defData(varname = "icc", 
                        dist = "uniform", formula = '0;.5', 
                        id = "idFamily")
  gen.family <- defData(gen.family, varname = "nMembers", 
                        dist = "uniformInt", formula = '2;6')
  
  dtFamily <- genData(100, gen.family)
  dtFamily <- trtAssign(dtFamily, n = 2)

  dtFamily

Yields:

     idFamily          icc nMembers trtGrp
  1:        1 0.0797418174        4      0
  2:        2 0.2390941635        2      1
  3:        3 0.3823993483        4      1
  4:        4 0.3848438379        3      0
  5:        5 0.1342742486        3      1

 95:       95 0.0120131961        4      0
 96:       96 0.1154628171        2      1
 97:       97 0.2969644184        3      0
 98:       98 0.1177376050        4      1
 99:       99 0.2571975326        6      0
100:      100 0.3215165243        6      1
     idFamily          icc nMembers trtGrp

And the individual members:

# individuals
  dtMembers <- genData(sum(dtFamily$nMembers))
  
  idFamily <- data.table(idFamily = rep(dtFamily$idFamily, dtFamily$nMembers))
  trtGrp <- data.table(trtGrp = rep(dtFamily$trtGrp, dtFamily$nMembers))
  
  dtMembers <- cbind(dtMembers, idFamily)
  dtMembers <- cbind(dtMembers, trtGrp)
  
  dtMembers

Yields:

      id idFamily trtGrp
  1:   1        1      0
  2:   2        1      0
  3:   3        1      0
  4:   4        1      0
  5:   5        2      1
 ---                    
394: 394      100      1
395: 395      100      1
396: 396      100      1
397: 397      100      1
398: 398      100      1

I haven’t used {simstudy} before, and I haven’t simulated ordinal data of that complexity before. Other’s will have to chime in. My best recommendation is to start small and build (e.g., first with one item and no nesting, then with 30 items and no other nesting, and so on).

1 Like

OK, here’s where I got.

  • Generate nested members (2-6) in 100 families, assign families to treatment or control
  • Generate 30 correlated ordinal items with a shift (effect) to make the treatment group’s sum scores lower on average
  • Simulate this 100 times
library(tidyverse)
library(simstudy)
library(data.table)
library(brms)
set.seed(8675309)

#' Simulate data
#' @param seed  simulation seed
#' @param N_fam total families
#' @param shift adjustment variable name in dtName - determines logistic shift
#' @param items number of items in scale
#' @param rho correlation coefficient, -1 < rho < 1.

  sim_d <- function(seed, N_fam, shift, items, rho) {
    
    set.seed(seed)
    
    # family level
      def_family <- simstudy::defData(varname = "nMembers", 
                                      dist = "uniformInt", formula = '2;6',
                                      id = "idFamily")
      
      dtFamily <- simstudy::genData(N_fam, def_family)
      dtFamily <- simstudy::trtAssign(dtFamily, n = 2)
      
      dtFamily <- dtFamily %>% 
        dplyr::mutate(z = if_else(trtGrp==1, shift, 0))
    
    # individuals
      dtMembers <- simstudy::genData(sum(dtFamily$nMembers))
      
      idFamily <- data.table(idFamily = rep(dtFamily$idFamily, dtFamily$nMembers))
      trtGrp <- data.table(trtGrp = rep(dtFamily$trtGrp, dtFamily$nMembers))
      z <- data.table(z = rep(dtFamily$z, dtFamily$nMembers))
      
      dtMembers <- cbind(dtMembers, idFamily)
      dtMembers <- cbind(dtMembers, trtGrp)
      dtMembers <- cbind(dtMembers, z)
      
      dtMembers
      
    # items
    # https://kgoldfeld.github.io/simstudy/articles/ordinal.html
      baseprobs <- matrix(rep(c(0.6, 0.25, 0.1, 0.05), items), 
                          nrow = items, byrow = TRUE)
      
      # z defined in family step above
      dtMembers <- simstudy::genOrdCat(dtMembers, 
                                       adjVar = "z", 
                                       baseprobs = baseprobs, 
                                       prefix = "q", 
                                       rho = rho, 
                                       corstr = "cs", 
                                       asFactor = FALSE) %>%
      # make possible item scores 0-3 (from 1-4)
        dplyr::mutate_at(vars(starts_with("q")), ~.x-1) %>%
        dplyr::rowwise() %>% 
        dplyr::mutate(sum_score = sum(c_across(starts_with("q")), na.rm = T)) %>%
        ungroup()
  }

# simulate and fit
  n_sim <- 100
  s <- tibble(seed = 1:n_sim) %>% 
    mutate(d = map(seed, sim_d, 
                   N_fam = 100,
                   shift = -0.3,
                   items = 30,
                   rho = 0.15)) %>% 
    mutate(fit = map2(d, seed, ~ brm(
      bf(sum_score ~ 1 + trtGrp + (1 | idFamily)),
      backend = "cmdstanr",
      data = .x,
      cores = parallel::detectCores()
  )) 
  )
          
  s %>%
    mutate(sd = map(.x = d, ~ sd(.x$sum_score))) %>%
    unnest(sd) %>%
    mutate(est = map(.x = fit, ~fixef(.x)["trtGrp", 1])) %>%
    unnest(est) %>%
    mutate(cd = est/sd) %>%
    ggplot(aes(x=cd)) +
      geom_histogram()

The data generating mechanism does not make items (and thus scores) correlated at the family level. The items themselves are correlated for each individual, but the family nesting is not represented.

Anyone have other ideas about how to simulate?

Of course the original question is still lurking. How to model when data come from multiple reporters. In the simulation I just represent the nesting by family and use the sum scores as the dv.

I think I’d take a step back and think carefully about what you have measured and what effect you are expecting to find. You could simply generate parameters for a normal distribution of latent variables per family (with a shift of means for intervention or separate distributions for intervention and control groups), draw the latent variables for each individual from that, and then generate the item level data given the latent variable of each member and an ordinal model for each item, but whether that is a good idea depends on your assumptions about what you are measuring.

“Family functioning” is a very broad construct, and I don’t know what measure you have used, but looking hard at the items and thinking about the response process might be useful. Is this a reflective or a formative measure? And what are the indicators used in the items? If the different members of the family are responding to generalised questions about the ways the family interact (“We usually apologize after quarrels”), differences among them might reflect different recall of examples, different experiences in the subsystems/dyads composing the family, or different perceptions of the same situations. If they are responding to questions about more verifiable or objective markers of family functioning (“We usually eat dinner together four days of the week or more”), differences might to a greater degree represent measurement error from a common truth.

These issues may have implications for the simulations, and will definitely impact the interpretation of any analysis, so I think it’s important to think carefully about them (and maybe you have!).

1 Like

Thanks for the thoughtful and helpful reply, @erognli.

I had not come across the terms “reflective” and “formative” before. If I’ve got it right, it’s what I understand to be the distinction between a scale and an index. For instance, items on a depression scale might be thought to be caused by the latent construct of depression (and thus the intercorrelations matter), whereas stocks in the S&P 500 drive the value of the index (and we don’t care about intercorrelations between stocks). Do I have this right?

The items are like your first example. Similar to the work on child behavior assessment that incorporates ratings from the parent, teacher, and child. Previously we’ve set aside the challenge and obtained one parent’s perspective, but this simulation is an attempt to move toward a more comprehensive approach to modeling.

You could simply generate parameters for a normal distribution of latent variables per family (with a shift of means for intervention or separate distributions for intervention and control groups), draw the latent variables for each individual from that, and then generate the item level data given the latent variable of each member and an ordinal model for each item

You had me at “simply” 😆. Could you share any guidance for doing this? I think it would help me across a range of situations, including this one.