Seeking examples of using brms to estimate treatment effects in experiment with multiple respondents per cluster

I want to simulate data and estimate the effect of a family-based intervention where multiple members of the family complete a 30-item instrument that can yield an overall score measuring family functioning. Families would be randomly assigned to treatment or wait-list control.

My preliminary search pointed me to work by Bauer et al on trifactor models and IRT. I’m seeking examples of this or competing approaches that would let me incorporate data from multiple family members in the estimation of the effect of the intervention on the family functioning score. Any tips?

1 Like

Not quite what you’re looking for, but Wagenmakers et al (2016; https://doi.org/10.1177/1745691616674458) did a multisite experiment with a Likert-type DV. I have a content sketch here where I fit a few models to the data of one of the sites. If you used that as a starting point, you could fit a fuller model on the data from all the sites, which would be somewhat analogous to persons nested in families (participants nested in labs).

1 Like

Interesting problem! I’ve done work using Stan om multi-informant data using IRT models, but not with brms.

If you haven’t already, I’d recommend looking into the literature on informant discrepancies . Multi-informant assessment of family variables is not theoretically trivial. Assuming that all differences informants can be interpreted as measurement error requires some strong assumptions and justification.

There was a special issue on the topic in the Journal of Clinical Child & Adolescent Psychology last year, which might be relevant: https://www.tandfonline.com/doi/full/10.1080/15374416.2022.2158843

1 Like

Thanks @Solomon and @erognli. I appreciate the pointers. Exploring them both.

I wonder if either of you might have better ideas about how to simulate this data. I’ve been poking around with simstudy but I’m finding it to be a complex challenge.

Trying to simulate:

  • 100 families with between 2 and 6 members each
  • Families assigned 1:1 to treatment or control
  • Responses to 30 Likert-style items, 0-3
    a) Person-level: items correlated to measure one latent construct
    b) Family-level: scale scores correlated within families
    c) Group-level: scale scores higher in treatment arm

I’d like to simulate datasets that vary on the number of families and the details for a, b, and c.

Here’s my attempt at the first part to create the family structure:

library(simstudy)
library(data.table)
set.seed(8675309)

# family level
  gen.family <- defData(varname = "icc", 
                        dist = "uniform", formula = '0;.5', 
                        id = "idFamily")
  gen.family <- defData(gen.family, varname = "nMembers", 
                        dist = "uniformInt", formula = '2;6')
  
  dtFamily <- genData(100, gen.family)
  dtFamily <- trtAssign(dtFamily, n = 2)

  dtFamily

Yields:

     idFamily          icc nMembers trtGrp
  1:        1 0.0797418174        4      0
  2:        2 0.2390941635        2      1
  3:        3 0.3823993483        4      1
  4:        4 0.3848438379        3      0
  5:        5 0.1342742486        3      1

 95:       95 0.0120131961        4      0
 96:       96 0.1154628171        2      1
 97:       97 0.2969644184        3      0
 98:       98 0.1177376050        4      1
 99:       99 0.2571975326        6      0
100:      100 0.3215165243        6      1
     idFamily          icc nMembers trtGrp

And the individual members:

# individuals
  dtMembers <- genData(sum(dtFamily$nMembers))
  
  idFamily <- data.table(idFamily = rep(dtFamily$idFamily, dtFamily$nMembers))
  trtGrp <- data.table(trtGrp = rep(dtFamily$trtGrp, dtFamily$nMembers))
  
  dtMembers <- cbind(dtMembers, idFamily)
  dtMembers <- cbind(dtMembers, trtGrp)
  
  dtMembers

Yields:

      id idFamily trtGrp
  1:   1        1      0
  2:   2        1      0
  3:   3        1      0
  4:   4        1      0
  5:   5        2      1
 ---                    
394: 394      100      1
395: 395      100      1
396: 396      100      1
397: 397      100      1
398: 398      100      1

I haven’t used {simstudy} before, and I haven’t simulated ordinal data of that complexity before. Other’s will have to chime in. My best recommendation is to start small and build (e.g., first with one item and no nesting, then with 30 items and no other nesting, and so on).

1 Like

OK, here’s where I got.

  • Generate nested members (2-6) in 100 families, assign families to treatment or control
  • Generate 30 correlated ordinal items with a shift (effect) to make the treatment group’s sum scores lower on average
  • Simulate this 100 times
library(tidyverse)
library(simstudy)
library(data.table)
library(brms)
set.seed(8675309)

#' Simulate data
#' @param seed  simulation seed
#' @param N_fam total families
#' @param shift adjustment variable name in dtName - determines logistic shift
#' @param items number of items in scale
#' @param rho correlation coefficient, -1 < rho < 1.

  sim_d <- function(seed, N_fam, shift, items, rho) {
    
    set.seed(seed)
    
    # family level
      def_family <- simstudy::defData(varname = "nMembers", 
                                      dist = "uniformInt", formula = '2;6',
                                      id = "idFamily")
      
      dtFamily <- simstudy::genData(N_fam, def_family)
      dtFamily <- simstudy::trtAssign(dtFamily, n = 2)
      
      dtFamily <- dtFamily %>% 
        dplyr::mutate(z = if_else(trtGrp==1, shift, 0))
    
    # individuals
      dtMembers <- simstudy::genData(sum(dtFamily$nMembers))
      
      idFamily <- data.table(idFamily = rep(dtFamily$idFamily, dtFamily$nMembers))
      trtGrp <- data.table(trtGrp = rep(dtFamily$trtGrp, dtFamily$nMembers))
      z <- data.table(z = rep(dtFamily$z, dtFamily$nMembers))
      
      dtMembers <- cbind(dtMembers, idFamily)
      dtMembers <- cbind(dtMembers, trtGrp)
      dtMembers <- cbind(dtMembers, z)
      
      dtMembers
      
    # items
    # https://kgoldfeld.github.io/simstudy/articles/ordinal.html
      baseprobs <- matrix(rep(c(0.6, 0.25, 0.1, 0.05), items), 
                          nrow = items, byrow = TRUE)
      
      # z defined in family step above
      dtMembers <- simstudy::genOrdCat(dtMembers, 
                                       adjVar = "z", 
                                       baseprobs = baseprobs, 
                                       prefix = "q", 
                                       rho = rho, 
                                       corstr = "cs", 
                                       asFactor = FALSE) %>%
      # make possible item scores 0-3 (from 1-4)
        dplyr::mutate_at(vars(starts_with("q")), ~.x-1) %>%
        dplyr::rowwise() %>% 
        dplyr::mutate(sum_score = sum(c_across(starts_with("q")), na.rm = T)) %>%
        ungroup()
  }

# simulate and fit
  n_sim <- 100
  s <- tibble(seed = 1:n_sim) %>% 
    mutate(d = map(seed, sim_d, 
                   N_fam = 100,
                   shift = -0.3,
                   items = 30,
                   rho = 0.15)) %>% 
    mutate(fit = map2(d, seed, ~ brm(
      bf(sum_score ~ 1 + trtGrp + (1 | idFamily)),
      backend = "cmdstanr",
      data = .x,
      cores = parallel::detectCores()
  )) 
  )
          
  s %>%
    mutate(sd = map(.x = d, ~ sd(.x$sum_score))) %>%
    unnest(sd) %>%
    mutate(est = map(.x = fit, ~fixef(.x)["trtGrp", 1])) %>%
    unnest(est) %>%
    mutate(cd = est/sd) %>%
    ggplot(aes(x=cd)) +
      geom_histogram()

The data generating mechanism does not make items (and thus scores) correlated at the family level. The items themselves are correlated for each individual, but the family nesting is not represented.

Anyone have other ideas about how to simulate?

Of course the original question is still lurking. How to model when data come from multiple reporters. In the simulation I just represent the nesting by family and use the sum scores as the dv.

I think I’d take a step back and think carefully about what you have measured and what effect you are expecting to find. You could simply generate parameters for a normal distribution of latent variables per family (with a shift of means for intervention or separate distributions for intervention and control groups), draw the latent variables for each individual from that, and then generate the item level data given the latent variable of each member and an ordinal model for each item, but whether that is a good idea depends on your assumptions about what you are measuring.

“Family functioning” is a very broad construct, and I don’t know what measure you have used, but looking hard at the items and thinking about the response process might be useful. Is this a reflective or a formative measure? And what are the indicators used in the items? If the different members of the family are responding to generalised questions about the ways the family interact (“We usually apologize after quarrels”), differences among them might reflect different recall of examples, different experiences in the subsystems/dyads composing the family, or different perceptions of the same situations. If they are responding to questions about more verifiable or objective markers of family functioning (“We usually eat dinner together four days of the week or more”), differences might to a greater degree represent measurement error from a common truth.

These issues may have implications for the simulations, and will definitely impact the interpretation of any analysis, so I think it’s important to think carefully about them (and maybe you have!).

1 Like

Thanks for the thoughtful and helpful reply, @erognli.

I had not come across the terms “reflective” and “formative” before. If I’ve got it right, it’s what I understand to be the distinction between a scale and an index. For instance, items on a depression scale might be thought to be caused by the latent construct of depression (and thus the intercorrelations matter), whereas stocks in the S&P 500 drive the value of the index (and we don’t care about intercorrelations between stocks). Do I have this right?

The items are like your first example. Similar to the work on child behavior assessment that incorporates ratings from the parent, teacher, and child. Previously we’ve set aside the challenge and obtained one parent’s perspective, but this simulation is an attempt to move toward a more comprehensive approach to modeling.

You could simply generate parameters for a normal distribution of latent variables per family (with a shift of means for intervention or separate distributions for intervention and control groups), draw the latent variables for each individual from that, and then generate the item level data given the latent variable of each member and an ordinal model for each item

You had me at “simply” 😆. Could you share any guidance for doing this? I think it would help me across a range of situations, including this one.

@Eric_Green - I read your message at a very busy time, and forgot to follow up when it no longer showed as unread (and I’ve rarely logged in to the forum lately). I’m very sorry for not replying. Hope you already solved the problem, and that it’s no longer useful, but here’s an example function anyway.

It simulates families of different sizes with latent scores for each individual drawn from a common distribution for the family, which itself has parameters drawn from a distribution across all families. It uses the clustering function of the simstudy package, which has been added somewhat recently, I think? If needed, you could add an item response style element to this, to simulate item scores instead of latent variables.

sim_data <- function(n_families = 100, 
                     range_family_size = c(3,6), 
                     dist_mu_family = c(0,1), 
                     dist_sigma_family = c(1,.2)) {
  
  require(simstudy); require(magrittr)
  
  ddef <- defData(id = 'family',
                  varname = 'n_members',
                  dist = 'uniformInt',
                  formula = paste(range_family_size[1], 
                                  ';', 
                                  range_family_size[2])) %>%
    
    defData(varname = 'mu_family',
            dist = 'normal',
            formula = dist_mu_family[1],
            variance = dist_mu_family[2]) %>%
    
    defData(varname = 'sigma_family',
            dist = 'gamma',
            formula = dist_sigma_family[1],
            variance = dist_sigma_family[2],
            link = 'identity')
  
  d <- genData(n = n_families,
               dtDefs = ddef)
  
  d <- genCluster(dtClust = d,
                  cLevelVar = 'family',
                  numIndsVar = 'n_members',
                  level1ID = 'id')
  
  ddef <- defDataAdd(varname = 'latent_score',
                     dist = 'normal',
                     formula = 'mu_family',
                     variance = 'sigma_family')
  
  d <- addColumns(dtDefs = ddef,
                  dtOld = d)
  
  return(d)
}
1 Like

This is really helpful, @erognli. Thank you for taking time to return to this request and help me get on the right track.

I tried adding a treatment group indicator d <- trtAssign(d, n = 2) and modified the ddef formula to 'mu_family + trtGrp*0.5' to modify mu based on assignment. Not sure I did it quite right, but I think this is the basic idea.

Next up on my learning goals is to figure out how to:

generate the item level data given the latent variable of each member and an ordinal model for each item

# revised attempt

set.seed(8675309)

sim_data <- function(n_families = 100, 
                     range_family_size = c(3,6), 
                     dist_mu_family = c(0,1), 
                     dist_sigma_family = c(1,.2)) {
  
  require(simstudy); require(magrittr)
  
  ddef <- defData(id = 'family',
                  varname = 'n_members',
                  dist = 'uniformInt',
                  formula = paste(range_family_size[1], 
                                  ';', 
                                  range_family_size[2])) %>%
    
  defData(varname = 'mu_family',
          dist = 'normal',
          formula = dist_mu_family[1],
          variance = dist_mu_family[2]) %>%
    
  defData(varname = 'sigma_family',
          dist = 'gamma',
          formula = dist_sigma_family[1],
          variance = dist_sigma_family[2],
          link = 'identity')
  
  d <- genData(n = n_families,
               dtDefs = ddef)
  
# ADDED
  d <- trtAssign(d, n = 2)
  
  d <- genCluster(dtClust = d,
                  cLevelVar = 'family',
                  numIndsVar = 'n_members',
                  level1ID = 'id')
  
  ddef <- defDataAdd(varname = 'latent_score',
                     dist = 'normal',
                     formula = 'mu_family + trtGrp*0.5', # ADDED
                     variance = 'sigma_family')
  
  d <- addColumns(dtDefs = ddef,
                  dtOld = d)
  
  return(d)
}

Ok, glad it was helpful!

So for simulating item level data you could use the genOrdCat-function of the simstudy package. Here is an example, with three items scored on a 1-3 scale. I think it’s a graded response IRT model in practice, where I’ve assumed independence of items conditional on the latent scores (which may not be appropriate - I’m not a psychometrician, so I’m at the edge of my competencies here).

sim_data <- function(n_families = 100, 
                     range_family_size = c(3,6), 
                     dist_mu_family = c(0,1), 
                     dist_sigma_family = c(1,.2)) {
  
  require(simstudy); require(magrittr)
  
  ddef <- defData(id = 'family',
                  varname = 'n_members',
                  dist = 'uniformInt',
                  formula = paste(range_family_size[1], 
                                  ';', 
                                  range_family_size[2])) %>%
    
    defData(varname = 'mu_family',
            dist = 'normal',
            formula = dist_mu_family[1],
            variance = dist_mu_family[2]) %>%
    
    defData(varname = 'sigma_family',
            dist = 'gamma',
            formula = dist_sigma_family[1],
            variance = dist_sigma_family[2],
            link = 'identity')
  
  d <- genData(n = n_families,
               dtDefs = ddef)
  
  d <- genCluster(dtClust = d,
                  cLevelVar = 'family',
                  numIndsVar = 'n_members',
                  level1ID = 'id')
  
  ddef <- defDataAdd(varname = 'latent_score',
                     dist = 'normal',
                     formula = 'mu_family',
                     variance = 'sigma_family')
  
  d <- addColumns(dtDefs = ddef,
                  dtOld = d)
  
  
  d <- genOrdCat(dtName = d,
                 adjVar = 'latent_score',
                 baseprobs = matrix(c(.8, .15, .05,
                                      .34, .33, .33,
                                      .05, .15, .8),
                                    nrow = 3,
                                    byrow = T),
                 asFactor = F,
                 idname = 'id',
                 prefix = 'item_',
                 rho = 0,
                 corstr = 'ind')
 
  return(d) }
2 Likes

This is very very helpful, @erognli! Mission accomplished!!

sim_data <- function(n_families = 100, 
                     range_family_size = c(2,6), 
                     dist_mu_family = c(0,1), 
                     dist_sigma_family = c(1,.2),
                     seed = NA) {
  
  require(simstudy); require(magrittr)
  
  ddef <- defData(id = 'family',
                  varname = 'n_members',
                  dist = 'uniformInt',
                  formula = paste(range_family_size[1], 
                                  ';', 
                                  range_family_size[2])) %>%
    
  defData(varname = 'mu_family',
          dist = 'normal',
          formula = dist_mu_family[1],
          variance = dist_mu_family[2]) %>%
    
  defData(varname = 'sigma_family',
          dist = 'gamma',
          formula = dist_sigma_family[1],
          variance = dist_sigma_family[2],
          link = 'identity')
  
  d <- genData(n = n_families,
               dtDefs = ddef)
  
# ADDED
  d <- trtAssign(d, n = 2)
  
  d <- genCluster(dtClust = d,
                  cLevelVar = 'family',
                  numIndsVar = 'n_members',
                  level1ID = 'id')
  
  ddef <- defDataAdd(varname = 'latent_score',
                     dist = 'normal',
                     formula = 'mu_family + trtGrp*0.5', # ADDED
                     variance = 'sigma_family')
  
  d <- addColumns(dtDefs = ddef,
                  dtOld = d)
  
# https://cran.r-project.org/web/packages/simstudy/vignettes/ordinal.html
  d <- genOrdCat(dtName = d,
                 adjVar = 'latent_score',
                 baseprobs = 
                   matrix(c(rep(c(0.05, 0.31, 0.23, 0.03, 0.03, 0.21, 0.03, 0.03, 0.03, 0.05), 10),
                            rep(c(0.38, 0.22, 0.14, 0.11, 0.02, 0.02, 0.02, 0.02, 0.05, 0.02), 10),
                            rep(c(0.08, 0.41, 0.16, 0.05, 0.05, 0.08, 0.08, 0.02, 0.02, 0.05), 10)
                            ),
                          nrow = 30,
                          byrow = T),
                 asFactor = F,
                 idname = 'id',
                 prefix = 'item_',
                 rho = 0.425,
                 corstr = 'ind')
  
  return(d)
}

This gives me the ability to simulate a dataset that meets the original goals!

  • 100 families with between 2 and 6 members each
  • Families assigned 1:1 to treatment or control
  • Responses to 30 ordinal items, 1-10 (formerly 0-3)
    a) Person-level: items correlated to measure one latent construct
    b) Family-level: scale scores correlated within families
    c) Group-level: scale scores higher in treatment arm
df <- sim_data(seed = 8675309) %>%
  dplyr::select(family, n_members, trtGrp, id, starts_with("item")) %>%
  mutate(scale_avg = rowMeans(across(starts_with("item")))) %>%
  as_tibble()

glimpse(df)
Rows: 372
Columns: 35
$ family    <int> 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8…
$ n_members <int> 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 5, 5, 5, 5, 5, 4, 4, 4, 4, 3…
$ trtGrp    <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0…
$ id        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, …
$ item_01   <int> 2, 8, 2, 2, 6, 1, 6, 2, 6, 2, 10, 2, 3, 2, 6, 6, 10, 2, 9, 6, 7, 7,…
$ item_02   <int> 6, 9, 3, 3, 2, 2, 4, 1, 8, 9, 8, 9, 2, 1, 8, 1, 7, 3, 2, 7, 7, 2, 7…
$ item_03   <int> 6, 10, 7, 2, 6, 4, 9, 2, 3, 6, 3, 6, 6, 2, 3, 2, 6, 2, 6, 2, 7, 6, …
$ item_04   <int> 7, 9, 2, 1, 3, 2, 4, 5, 2, 3, 10, 2, 10, 2, 3, 2, 3, 2, 6, 2, 10, 2…
$ item_05   <int> 2, 6, 1, 2, 10, 2, 10, 3, 3, 6, 6, 9, 3, 2, 3, 2, 2, 1, 7, 3, 6, 10…
$ item_06   <int> 6, 6, 3, 2, 10, 2, 6, 2, 9, 6, 3, 10, 4, 2, 1, 2, 3, 2, 10, 10, 6, …
$ item_07   <int> 3, 6, 6, 1, 2, 2, 6, 2, 9, 10, 10, 10, 2, 2, 9, 2, 7, 3, 8, 7, 6, 3…
$ item_08   <int> 3, 6, 3, 3, 2, 2, 1, 2, 6, 6, 7, 6, 6, 5, 2, 6, 10, 2, 10, 6, 9, 3,…
$ item_09   <int> 6, 3, 6, 2, 7, 2, 3, 3, 3, 6, 6, 3, 3, 2, 2, 6, 3, 2, 7, 9, 9, 2, 2…
$ item_10   <int> 2, 6, 3, 2, 6, 4, 6, 4, 3, 6, 8, 10, 3, 2, 2, 5, 6, 3, 2, 2, 6, 6, …
$ item_11   <int> 2, 10, 4, 2, 4, 6, 3, 2, 3, 4, 4, 9, 8, 1, 2, 1, 9, 3, 9, 1, 8, 3, …
$ item_12   <int> 2, 10, 1, 1, 3, 1, 4, 1, 1, 1, 1, 6, 2, 1, 4, 1, 1, 1, 4, 9, 8, 6, …
$ item_13   <int> 1, 1, 3, 2, 9, 2, 4, 1, 2, 2, 6, 7, 9, 1, 2, 2, 4, 1, 4, 3, 4, 1, 1…
$ item_14   <int> 2, 1, 1, 1, 10, 3, 3, 1, 4, 2, 7, 5, 1, 2, 1, 2, 4, 9, 1, 6, 6, 1, …
$ item_15   <int> 3, 9, 1, 1, 9, 1, 1, 1, 1, 9, 4, 5, 1, 1, 1, 2, 3, 1, 1, 10, 2, 3, …
$ item_16   <int> 9, 2, 1, 1, 1, 1, 3, 2, 2, 4, 1, 8, 8, 1, 1, 4, 10, 3, 4, 4, 10, 1,…
$ item_17   <int> 2, 2, 1, 1, 4, 1, 4, 1, 1, 1, 5, 3, 1, 1, 3, 1, 1, 2, 8, 2, 5, 1, 4…
$ item_18   <int> 9, 2, 1, 1, 1, 1, 3, 1, 3, 9, 9, 3, 4, 1, 1, 1, 1, 2, 1, 3, 1, 3, 4…
$ item_19   <int> 2, 10, 1, 1, 7, 2, 6, 6, 2, 9, 3, 10, 1, 2, 1, 1, 2, 1, 4, 8, 4, 1,…
$ item_20   <int> 9, 9, 1, 1, 4, 3, 9, 2, 9, 3, 4, 3, 1, 1, 1, 3, 4, 1, 10, 3, 3, 1, …
$ item_21   <int> 2, 7, 2, 2, 2, 2, 2, 2, 3, 2, 2, 7, 1, 2, 3, 3, 10, 2, 10, 7, 2, 2,…
$ item_22   <int> 3, 5, 2, 2, 10, 2, 6, 2, 5, 10, 7, 4, 3, 1, 2, 2, 1, 4, 6, 2, 2, 2,…
$ item_23   <int> 10, 7, 2, 1, 4, 2, 10, 6, 3, 10, 4, 2, 5, 6, 2, 6, 5, 2, 2, 3, 3, 1…
$ item_24   <int> 2, 9, 1, 2, 3, 4, 8, 4, 2, 7, 10, 2, 1, 1, 2, 3, 2, 2, 10, 6, 4, 2,…
$ item_25   <int> 3, 7, 1, 2, 3, 2, 2, 6, 2, 6, 7, 7, 1, 2, 2, 2, 2, 2, 2, 10, 5, 5, …
$ item_26   <int> 2, 2, 2, 1, 10, 2, 7, 6, 3, 5, 7, 2, 10, 2, 2, 10, 4, 7, 6, 7, 3, 3…
$ item_27   <int> 7, 5, 2, 2, 2, 1, 9, 3, 1, 10, 5, 8, 1, 2, 2, 2, 7, 2, 7, 3, 4, 1, …
$ item_28   <int> 2, 7, 2, 1, 10, 2, 2, 6, 10, 6, 7, 10, 2, 1, 1, 6, 7, 5, 7, 2, 1, 2…
$ item_29   <int> 8, 8, 2, 1, 7, 1, 10, 3, 2, 2, 4, 4, 5, 8, 2, 1, 3, 2, 5, 2, 5, 1, …
$ item_30   <int> 10, 7, 3, 1, 3, 2, 3, 1, 2, 7, 5, 2, 4, 10, 4, 7, 7, 3, 4, 4, 6, 2,…
$ scale_avg <dbl> 4.433333, 6.300000, 2.333333, 1.566667, 5.333333, 2.133333, 5.13333…

Receipts…

# 100 families...
# ----------------------------------------------------------------------
df %>% 
  distinct(family) %>% 
  nrow()
# 100

# ...with between 2 and 6 members each
# ----------------------------------------------------------------------
df %>% 
  group_by(family) %>% 
  count() %>% 
  ungroup() %>% 
  summarize(min = min(n), max=max(n))

## A tibble: 1 × 2
#    min   max
#  <int> <int>
#1     2     6

# Families assigned 1:1 to treatment or control
# ----------------------------------------------------------------------
df %>% 
  distinct(family, .keep_all = TRUE) %>% 
  group_by(trtGrp) %>% 
  count()

## A tibble: 2 × 2
## Groups:   trtGrp [2]
#  trtGrp     n
#   <int> <int>
#1      0    50
#2      1    50

# Person-level: items correlated to measure one latent construct
performance::item_intercor(df %>% select(starts_with("item")), method = "pearson")

# 0.3426369

# Family-level: scale scores correlated within families
# ----------------------------------------------------------------------
fit <- lme4::lmer(scale_avg ~ trtGrp + (1 | family), data = df)
performance::icc(fit)

# Adjusted ICC: 0.420

# Group-level: scale scores higher in treatment arm
# ----------------------------------------------------------------------
parameters::model_parameters(fit, effects = "fixed", ci_method = "satterthwaite")

Parameter   | Coefficient |   SE |       95% CI |     t |    df |      p
------------------------------------------------------------------------
(Intercept) |        3.63 | 0.19 | [3.26, 4.00] | 19.51 | 93.09 | < .001
trtGrp      |        0.71 | 0.27 | [0.18, 1.24] |  2.67 | 95.72 | 0.009