This is a general modeling question, but @matti suggested it might be interesting to post it here, so I’m taking his advice. :)
I’m wondering if anyone has ideas for modeling the following: I have an IV that measures a variable from a mixture of domains and I know for each observation what domains contribute to the IV, but I don’t have a separate measurement per domain. I want to look at whether the association between the DV and the IV differ depending on the domains that contribute to the IV rating.
To construct a silly but reproducible example, once per day, the participant is asked the degree of their emotional arousal over the course of the day. They are also asked to choose which domain of emotions they felt that day, for example, happy, sad, angry. They are also asked to how much they are looking forward to the next day. We are interested in knowing whether the effect of emotional arousal on future-orientation differs depending on which emotions were experienced that day. The problem is that we do not have separate ratings of arousal per emotion, so the arousal score potentially represents a mixture from multiple emotional experiences. We also know that the lowest score for arousal corresponds to not having had any notably emotional experiences. In other words, when arousal = 1, no emotional domain is selected.
Here’s the top of a simulated data frame, where ids
specifies the participant ID, and fo
is the “future orientation” rating:
> head(observed_data)
happy angry sad other arousal ids fo
1 0 0 0 0 1 1 4
2 0 0 1 1 6 1 4
3 0 0 0 0 1 1 5
4 1 0 1 1 4 1 5
5 0 0 0 1 5 1 3
6 1 1 0 0 4 1 5
I’m assuming that a person’s arousal rating is made by that person sort of averaging over all the emotional experiences they had that day.
My first thought was to specify a model (using lme4/brms syntax as shorthand) like:
fo ~ 1 + arousal*happy + arousal*angry + arousal*sad + arousal*other +
(1 + arousal*happy + arousal*angry + arousal*sad + arousal*other | ids)
After a few repetitions of generating data and running this model, it seems like this at least recovers the spirit of the data generating coefficients. I haven’t run a full set of simulations yet in part because I’m a little worried this is somehow obviously terribly wrong in a way that you all will easily spot.
Here is the data generating code in R, which accounts for the fact that our observations average over several distinct processes (i.e., the contribution from each emotion domain to future-orientation):
N <- 100
obs <- 30
nn <- N*obs
ids <- rep(1:N, each = obs)
#Simulate whether or not certain emotional experiences happened across several
#days for several participants (we assume they are independent):
emos <- data.frame(happy = rbinom(nn, size = 1, prob = .2),
angry = rbinom(nn, size = 1, prob = .2),
sad = rbinom(nn, size = 1, prob = .2),
other = rbinom(nn, size = 1, prob = .5))
#Simulate possible level of arousal experienced for each emotion. Could
#specify a more complicated dgp here than just rnorm.
arousal_scores <- function(n){
rp <- rnorm(n)
return(rp)
}
possible_arousal <- data.frame(happy = arousal_scores(nn),
angry = arousal_scores(nn),
sad = arousal_scores(nn),
other = arousal_scores(nn))
experienced_arousal <- emos * possible_arousal
###
#Create the observed scores for reported arousal:
##
#I'm going to assume that a person will report an average level of arousal
#across all experiences, with some error.
avging_error <- 0 # setting to 0 for now
arousal_reported <- apply(experienced_arousal, 1, function(arow){
n_emos <- sum(arow != 0)
if(n_emos == 0){
reported_arousal <- 0
} else {
total_arousal <- sum(arow)
average_arousal <- total_arousal / n_emos + rnorm(1, 0, avging_error)
reported_arousal <- average_arousal
}
return(reported_arousal)
})
#put the reported arousal on the measurement scale
mn <- min(arousal_reported)
mx <- max(arousal_reported)
arousal_reported <- round((arousal_reported - mn) / (mx - mn) * 6)
arousal_reported <- ifelse(arousal_reported > 6, 5, arousal_reported)
#ensure that if folks report no emotional experiences, they report no arousal
arousal_reported <- arousal_reported * as.numeric(rowSums(emos)>0) + 1
###
#create the matrix for the data generating process
##
latent_X <- cbind(emos, experienced_arousal, ids)
names(latent_X) <- c(names(latent_X)[1:4], paste0(names(latent_X)[5:8], '_a'), names(latent_X)[9])
#id varying intercept
fo_intercept <- rnorm(N)
#assuming no effect of experiencing a particular emotion, per se.
pop_coefs <- c(0, 0, 0, 0, .5, -.2, -.6, .1)
id_coefs <- lapply(1:N, function(i, coefs){
c <- rnorm(length(coefs), mean = coefs, sd = .5)
return(c)
}, coefs = pop_coefs)
true_fo <- unlist(lapply(X = 1:N, FUN = function(id, Xmat, intercept, idcol){
X <- Xmat[Xmat[,idcol] == id, -idcol] #select rows from X for id in idcol
y <- intercept[[id]] + as.matrix(X) %*% id_coefs[[id]] + rnorm(dim(X)[1], 0, 1)
return(y)
}, Xmat = latent_X, intercept = fo_intercept, idcol = 9))
#Put the observed FO score on the instrument scale
obs_fo <- round((true_fo - min(true_fo)) / (max(true_fo) - min(true_fo)) * 6 + 1)
#observed data frame:
observed_data <- cbind(emos, data.frame(arousal = arousal_reported), ids, fo = obs_fo)
library(lme4)
summary(lmer(fo ~ arousal*happy + arousal*angry + arousal*sad + arousal*other - arousal +
(1 + arousal*happy + arousal*angry + arousal*sad + arousal*other - arousal || ids), data = observed_data))
Thank you for any help and advice!