How to model regression where predictor measures a mixture of domains, with domains known

This is a general modeling question, but @matti suggested it might be interesting to post it here, so I’m taking his advice. :)

I’m wondering if anyone has ideas for modeling the following: I have an IV that measures a variable from a mixture of domains and I know for each observation what domains contribute to the IV, but I don’t have a separate measurement per domain. I want to look at whether the association between the DV and the IV differ depending on the domains that contribute to the IV rating.

To construct a silly but reproducible example, once per day, the participant is asked the degree of their emotional arousal over the course of the day. They are also asked to choose which domain of emotions they felt that day, for example, happy, sad, angry. They are also asked to how much they are looking forward to the next day. We are interested in knowing whether the effect of emotional arousal on future-orientation differs depending on which emotions were experienced that day. The problem is that we do not have separate ratings of arousal per emotion, so the arousal score potentially represents a mixture from multiple emotional experiences. We also know that the lowest score for arousal corresponds to not having had any notably emotional experiences. In other words, when arousal = 1, no emotional domain is selected.

Here’s the top of a simulated data frame, where ids specifies the participant ID, and fo is the “future orientation” rating:

> head(observed_data)
  happy angry sad other arousal ids fo
1     0     0   0     0       1   1  4
2     0     0   1     1       6   1  4
3     0     0   0     0       1   1  5
4     1     0   1     1       4   1  5
5     0     0   0     1       5   1  3
6     1     1   0     0       4   1  5

I’m assuming that a person’s arousal rating is made by that person sort of averaging over all the emotional experiences they had that day.

My first thought was to specify a model (using lme4/brms syntax as shorthand) like:

fo ~ 1 + arousal*happy + arousal*angry + arousal*sad + arousal*other + 
    (1 + arousal*happy + arousal*angry + arousal*sad + arousal*other | ids)

After a few repetitions of generating data and running this model, it seems like this at least recovers the spirit of the data generating coefficients. I haven’t run a full set of simulations yet in part because I’m a little worried this is somehow obviously terribly wrong in a way that you all will easily spot.

Here is the data generating code in R, which accounts for the fact that our observations average over several distinct processes (i.e., the contribution from each emotion domain to future-orientation):

N <- 100
obs <- 30
nn <- N*obs
ids <- rep(1:N, each = obs)

#Simulate whether or not certain emotional experiences happened across several
#days for several participants (we assume they are independent):
emos <- data.frame(happy = rbinom(nn, size = 1, prob = .2),
                   angry = rbinom(nn, size = 1, prob = .2),
                   sad = rbinom(nn, size = 1, prob = .2),
                   other = rbinom(nn, size = 1, prob = .5))

#Simulate possible level of arousal experienced for each emotion. Could
#specify a more complicated dgp here than just rnorm.
arousal_scores <- function(n){
  rp <- rnorm(n)
  return(rp)
}
possible_arousal <- data.frame(happy = arousal_scores(nn),
                      angry = arousal_scores(nn),
                      sad = arousal_scores(nn),
                      other = arousal_scores(nn))
experienced_arousal <- emos * possible_arousal

###
#Create the observed scores for reported arousal:
##

#I'm going to assume that a person will report an average level of arousal
#across all experiences, with some error.
avging_error <- 0 # setting to 0 for now
arousal_reported <- apply(experienced_arousal, 1, function(arow){
  n_emos <- sum(arow != 0)
  if(n_emos == 0){
    reported_arousal <- 0
  } else {
    total_arousal <- sum(arow)
    average_arousal <- total_arousal / n_emos + rnorm(1, 0, avging_error)
    reported_arousal <- average_arousal
  }
  return(reported_arousal)
})
#put the reported arousal on the measurement scale
mn <- min(arousal_reported)
mx <- max(arousal_reported)
arousal_reported <- round((arousal_reported - mn) / (mx - mn) * 6)
arousal_reported <- ifelse(arousal_reported > 6, 5, arousal_reported)
#ensure that if folks report no emotional experiences, they report no arousal
arousal_reported <- arousal_reported * as.numeric(rowSums(emos)>0) + 1

###
#create the matrix for the data generating process
##
latent_X <- cbind(emos, experienced_arousal, ids)
names(latent_X) <- c(names(latent_X)[1:4], paste0(names(latent_X)[5:8], '_a'), names(latent_X)[9])

#id varying intercept
fo_intercept <- rnorm(N)

#assuming no effect of experiencing a particular emotion, per se.
pop_coefs <- c(0, 0, 0, 0, .5, -.2, -.6, .1)
id_coefs <- lapply(1:N, function(i, coefs){
  c <- rnorm(length(coefs), mean = coefs, sd = .5)
  return(c)
}, coefs = pop_coefs)

true_fo <- unlist(lapply(X = 1:N, FUN = function(id, Xmat, intercept, idcol){
  X <- Xmat[Xmat[,idcol] == id, -idcol] #select rows from X for id in idcol
  y <- intercept[[id]] + as.matrix(X) %*% id_coefs[[id]] + rnorm(dim(X)[1], 0, 1)
  return(y)
}, Xmat = latent_X, intercept = fo_intercept, idcol = 9))

#Put the observed FO score on the instrument scale
obs_fo <- round((true_fo - min(true_fo)) / (max(true_fo) - min(true_fo)) * 6 + 1)

#observed data frame:
observed_data <- cbind(emos, data.frame(arousal = arousal_reported), ids, fo = obs_fo)

library(lme4)
summary(lmer(fo ~ arousal*happy + arousal*angry + arousal*sad + arousal*other - arousal + 
               (1 + arousal*happy + arousal*angry + arousal*sad + arousal*other - arousal || ids), data = observed_data))

Thank you for any help and advice!

Should I be thinking about this as a mixture?

The other thing that seems like it might make sense in this situation is a mixture of regressions, although something about this doesn’t seem quite right.

So using #interfaces:brms syntax, it might be something like this:

library(brms)

mix = mixture(gaussian(), gaussian(), gaussian(), gaussian()) 
prior = c(prior(normal(0, 10), Intercept, dpar = mu1),
          prior(normal(0, 10), Intercept, dpar = mu2),
          prior(normal(0, 10), Intercept, dpar = mu3),
          prior(normal(0, 10), Intercept, dpar = mu4))

fit <- brm(bf(fo ~ 1 + (1 | ids), 
              mu1 ~ happy + happy:arousal + (1 | ids), 
              mu2 ~ angry + angry:arousal + (1 | ids),
              mu3 ~ sad + sad:arousal + (1 | ids),
              mu4 ~ other + other:arousal + (1 | ids)),
           data = observed_data,
           family = mix,
           prior = prior, 
           chains = 4, cores = 4) 
summary(fit)

Hey there!

Maybe something like a multi-outcome model would work?

# I guess this will be pretty slow if you do it in brms...
bf1 <- bf(arousal ~ happy + angry + sad + other)
bf2 <- bf(fo ~ arousal) # or maybe just bf(fo ~ 1)
mod <- brm(bf1 + bf2, data = observed_data) # with rescor == TRUE (default)

The effects of happy, angry, … will influence fo through the residual correlation.

The better approach would be to do something like a ordered Probit for bf1 and have the implied latent values be correlated with the latent bf2 outcomes (when this is also an ordered Probit). There is probably not an easy way to do this in brms (but then again, I don’t know brms that well…).

Just my 2 cents…
Cheers!
Max

Thanks @Max_Mantei. I tried this out and the results confirm that the presence or absence of the domains contribute in appropriate proportions to the arousal score and that fo is conditional on arousal, but this doesn’t recover the degree to which arousal from a particular domain contributes to fo.

Hey there!

I think you can derive the marginal effect from everything brms gives you.

Suppose

arousal ~ happy + angry + sad + other
fo ~ 1

with joint (multivariate) normality and rescor == TRUE. The model is then:

\begin{bmatrix} \text{arousal}_i \\ \text{fo}_i \end{bmatrix} \sim MVN \left( \begin{bmatrix} \alpha_1 + \beta_1\text{happy}_i + \beta_2\text{angry}_i + \beta_3\text{sad}_i + \beta_4\text{other}_i \\ \alpha_2 \end{bmatrix} , \begin{bmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{bmatrix} \right)

Now the conditional distribution of fo is univariate normal:

\text{fo}_i | \text{arousal}_i \sim N(\alpha_2 + \rho \frac{\sigma_2}{\sigma_1} (\text{arousal}_i - \alpha_1 - \beta_1\text{happy}_i - \beta_2\text{angry}_i - \beta_3\text{sad}_i - \beta_4\text{other}_i), \sigma_2^2(1- \rho^2))

and

E(\text{fo}_i | \text{arousal}_i) = \alpha_2 + \rho \frac{\sigma_2}{\sigma_1}\text{arousal}_i - \rho \frac{\sigma_2}{\sigma_1}\alpha_1 - \rho \frac{\sigma_2}{\sigma_1}\beta_1\text{happy}_i - \rho \frac{\sigma_2}{\sigma_1}\beta_2\text{angry}_i - \rho \frac{\sigma_2}{\sigma_1}\beta_3\text{sad}_i - \rho \frac{\sigma_2}{\sigma_1}\beta_4\text{other}_i

so the marginal effect of happy on fo (through arousal) is -\beta_1\rho\frac{\sigma_2}{\sigma_1}.

Cheers,
Max

1 Like

That’s fantastic, @Max_Mantei ! Thank you for the further explanation. So grateful to have support from friendly, smart, strangers like you.

1 Like

Another strategy I’ve tried that seems to work fairly well (at least it provides results that are in line with the specified parameters, though I haven’t run simulations to confirm it has all the desirable properties).

This model tries to deal with this problem by specifying the unobserved per-domain arousal data as a parameter:

data {
  int<lower=0> N;              // num obs
  int<lower=1> D;              // num domains
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  int<lower=1,upper=J> jj[N];  // group for obs
  matrix[N, D] x;              // observation-varying domain dummy codes
  row_vector[L] u[J];          // group predictors
  vector[N] y;                 // outcomes (fo)
  vector[N] z;                 // outcomes (observed arousal)
}
transformed data{
  int<lower=2> K = D*2 + 1;    // effect of presence or absence and account for intercept
}
parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  vector[K] beta[J];           // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
  real<lower=0> epsilon;       // latent measurement error
  matrix[N, D] X_;             // unobserved arousal by domain
}
transformed parameters {
  matrix[N, D] X = X_ .* x; // latent arousal is always 0 when domain is not present.
  matrix[N, K] X_i = append_col(rep_vector(1, N), append_col(x, X));
}
model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  epsilon ~ weibull(1, .1);    // assume very little error averaging over latent arousal
  to_vector(gamma) ~ normal(0, 5);
  to_vector(X_) ~ normal(0,5);
  {
    row_vector[D] u_gamma[J];
    for (j in 1:J)
      u_gamma[j] = u[j] * gamma;
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N){
    z[n] ~ normal(mean(X[n]), epsilon); //observed arousal should be just the mean of latent arousal when that domain is present.
    y[n] ~ normal(X_i[n] * beta[jj[n]], sigma); //fo is conditional on latent arousal
  }
}

And the R code to implement given the simulations above:

library(rstan)
domain_cols <- c("happy", "angry", "sad", "other")
N <- dim(observed_data)[1]
D <- length(domain_cols)
K <- D*2 + 1
stan_data <- list(
  N = N,
  D = D,
  J = length(unique(observed_data$ids)),
  L = 1,
  jj = observed_data$ids,
  x = observed_data[, domain_cols],
  u = matrix(rep(1, length(unique(observed_data$ids))), ncol = 1),
  y = observed_data$fo,
  z = observed_data$arousal
)

#matrix[N, D] X
#matrix[N, K] X_i
ND <- expand.grid(1:N, 1:D)
NK <- expand.grid(1:N, 1:K)
exclude_pars <- c(paste0('X[', apply(ND, 1, paste, collapse = ','), ']'),
                  paste0('X_i[', apply(NK, 1, paste, collapse = ','), ']'))

mod <- rstan::stan('~/Desktop/dgp.stan',
                   model_name = 'dgp_model',
                   data = stan_data,
                   iter = 4000,
                   warmup = 1000,
                   chains = 4,
                   cores = 4,
                   pars = exclude_pars, include = FALSE,
                   control = list(adapt_delta = .999, max_treedepth = 20))