Grouped latent factor and polynomial predictors

Hi BRMS community! I am not sure if this is the right forum for a question like this, so if not please let me know - I’ve been stuck for awhile and my brain is taxed! Any help appreciated.

I have a dataset of measurements from pairs of conversation partners.

  • One of the measures is, “How much did you like the person you talked to?” (i_like_you, a 1-7 scale)
  • Another measure is “How much do you think they liked you?” (i_think_you_like_me, also 1-7)
  • There’s a term, you_actually_like_me which in the real data is really just your partner’s i_like_you
  • Finally, we have a measure of how much each person enjoyed the conversation (enjoyment)

I’ve made a toy dataset in which every speaker_id has two conversations, each with a different partner.

library(tidyverse)

set.seed(1L)
n <- 40

data <- tibble(
  convo_id = 1:n,
  speaker_id = rep(LETTERS[1:20], 2),
  partner_id = c(LETTERS[11:20], LETTERS[1:10], LETTERS[20:11], LETTERS[10:1]),
  i_like_you = sample(1:7, n, replace = TRUE),
  i_think_you_like_me = sample(1:7, n, replace = TRUE),
  you_actually_like_me = sample(1:7, n, replace = TRUE),
  enjoyment = sample(1:7, n, replace = TRUE)
) |> 
  arrange(speaker_id)

The data look like this:

data |> head()
# A tibble: 6 × 7
  convo_id speaker_id partner_id i_like_you i_think_you_like_me you_actually_like_me enjoyment
     <int> <chr>      <chr>           <int>               <int>                <int>     <int>
1        1 A          K                   4                   5                    4         2
2       21 A          T                   7                   3                    3         7
3        2 B          L                   7                   7                    1         6
4       22 B          S                   7                   3                    7         4
5        3 C          M                   5                   2                    7         5
6       23 C          R                   6                   1                    1         6

*Note: This toy example is not representative in that i_like_you and you_actually_like_me are not mirrored between conversation partners. But for the sake of the modeling question I don’t think this matters.

My goal is to model the impact of the gap between i_think_you_like_me vs you_actually_like_me, on enjoyment (let’s call this liking_gap).

The challenge I’m running into is that I want to take into account the fact that each person has some latent overall bias towards liking people in general (let’s call this liking_bias), as well as some latent bias related to how much they think people like them in general (perception_bias).

In other words, in our own conversation, I want to be able to estimate the degree to which liking_gap predicts enjoyment, above and beyond the degree to which latent liking_bias and perception_bias influence enjoyment.

To complicate things further, it’s likely that the relationship between these predictors and enjoyment is curvilinear - there’s some kind of sweet spot (probably quadratic) that corresponds with maximum enjoyment.

I think I should be able to model this as a Bayesian hierarchical model, grouped by individual, with each individual having parameters for liking_bias and perception_bias, and linear and squared terms for i_think_you_like_me and you_actually_like_me, plus their interaction (which I think would basically represent liking_gap without resorting to computing a difference score).

I started following Scott Claessen’s tutorial on how to model latent variables with BRMS, but I got confused trying to figure out how to translate his syntax, which specifies latent variables with mi() and multiple columns, into my structure, which is thinking of multiple observations from the same column as being the emission states of the latent variable.

I think a problem like this would usually be solved with SEM, but after having a look at blavaan and reading through this brms Github thread on the gradual development of SEM-style support, I got a little overwhelmed and so thought I’d just ask for help.

Thank you again for any help anyone can offer to this brms newbie!

Update: I can’t find a way to delete this post, but after thinking about it some more I revised my question and corrected the way I was generating example data, and posted it on CrossValidated instead. If you have any thoughts, please consider sharing there! multiple regression - How to model the true effect of a difference score, minus rater bias - Cross Validated

Have any exposure to Item Response Theory? Seems pretty straightforward in that context, and can indeed be expressed as an SEM but I don’t think that’s necessary to think in that framework. Pseudo-code:


latent I like partner [r,p] = (
    likability[p] 
    + like_bias[r]
)
I like my partner [r,p] ~ binomial(
    latent I like partner [r,p]
)

latent partner actually likes me [r,p] = (
    likability[r] 
    + like_bias[p]
)
Partner actually likes me[r,p] ~ binomial(
    latent partner actually likes me [r,p]
)


latent think partner likes me [r,p] = (
        likability[r]
        + like_bias[p]
        + think_like_bias[r] 
    )

think partner likes me [r,p] ~ binomial(
    latent think partner likes me [r,p]
)


Enjoy[r, p] ~ 
    ordinal (
        latent I like partner[r,p] * w1[r]
        + latent think partner likes me [r,p] * w2[r]
        + … // interactions go here, recommend building up from simpler models to larger
        + enjoy_bias[r]
        , cut points[r]
    )

Where the parameter w1, w2, … reflect the weight of a given latent in the enjoy decision. Probably should use partial pooling for each of those weight vectors.

Note that in my above formulation, you would not have this difference in there as you express it, because I don’t think makes sense to include information explicitly outside the respondent’s awareness (whether the partner actually likes them). It does make sense to model the partner’s actual liking as an outcome as it informs on other latent parameters that are pertinent to the enjoy equation.

Thank you Mike! I hadn’t thought about IRT here. Will give it a shot, much appreciated.

Oh, note that I hadn’t seen your data snippet before writing my pseudo code , so I missed that all the items were ordinal, so replace all instances of binomial with ordinal and add cut points for each responded (and different cut points for different items).

1 Like