Nesting effects using nl=TRUE

I am building a model of certain conversational phenomena comparing across languages: e.g. would people in language A backchannel (for instance, say Hmm, or nod while the other person is speaking) more than in language B.

We have 4 conversations per each pair, 2 spontaneous ones, and 2 aimed at solving lab tasks (with the two tasks being quite different, but sharing the task-oriented nature). So I want to model

  • backchannel as a function of conversation (Session, factor w 4 levels) as a function of task (factor w 2 levels), and both conversation and task as a function of language (factor w 2 levels).
  • varying effects of speaker (interlocutor) nested within pair (each interlocutor appears only within one pair and speaks only one language)
  • get estimates per each conversation and task (for practical reasons)

My basic (incorrect) model would be something like
Backchannel ~ 0 + Session : Language + Task:Language + (0 + Session + Task | Pair / Interlocutor)

Using a non-linear model I could transform it to a more proper:

bf(Backchannel ~ 0 + bSess * Session,
bSess ~ 0 + (bTask * Task) : (bLangSess * Language) + (1 | Pair / Interlocutor),
bTask ~ 0 + bLangTask * Language + (1 | Pair / Interlocutor),
bLangSess + bLangTask ~ 1,
nl = TRUE)

Where the coefficient of session is generated from task and language plus some variation, and the coefficient of task is also generated from language.

Does this seem correct conceptually?

Additionally, when trying to get the priors to be defined for this I get a
“Error: The parameter ‘bTask’ is not a valid distributional or non-linear parameter. Did you forget to set ‘nl = TRUE’?”

I’m probably committing a silly error, but I can’t really see it now.

You mixed linear and non-linear formula syntax in

bSess ~ 0 + (bTask * Task) : (bLangSess * Language) + (1 | Pair / Interlocutor)

Should this be linear or non-linear syntax? You can’t have both at the same time.

It should be nonlinear, given that I’d then want to model the “effect” of task to be dependent on language.

I guess I could do something like:

bf(Backchannel ~ 0 + (bTask * Task) * (bLangSess * Language) * Session,
bTask ~ 0 + Language + (1 | Pair / Interlocutor),
bLangSess ~ 1,
nl = TRUE)

But I’m a bit unsure as to how to specify that also the “effect” of session could have a varying effect by pair and interlocutor.

Add another parameter to bSess that contains the group-level part of session.

Ok, thanks!

This seems to work:

bf(Backchannel ~ 0 + bSess * (bTask * Task) * (bLang * Language) * Session,
bTask + bSess ~ 0 + Language + (1 | Pair / Interlocutor),
bLang ~ 1,
nl = TRUE)

Looks good to me. You may also remove the 0 in the first formula, since this add literally nothing if put into a non-linear formula.

last thing, sorry: Session is a factor with 4 levels (4 conversations), how do I accommodate for that in the non-linear part? It’s complaining I can only have 2 levels.

Let’s take a step back, and discuss what you actually want to achieve with this mix of linear and non-linear formulas. I worry a little bit that we make things more obscure than necessary.

So you want to estimate the joint impact of task, language and session, but the actual desired interplay between these variables is still not entirely clear to me.

I am interested in the joint effect of language and task. However, the conversations have systematic and individual variance beyond task. There are 2 convos per task with different Setups. Conv 1 and 4 always are task 1 and conv 2 and 3 always task 2.

I am beginning to understand the details. Please apologize for my ignorance, but what if you tried to just use a linear formula

Backchannel ~ 0 + Session:Language + (0 + Session | Pair / Interlocutor)

and then computed the contrast of task (i.e. mean of session 1 and 4 vs. mean of session 2 and 3) after model fitting?

That would definitely work (and it’s been run). It’s a tad dissatisfying in that the actual parameters we are interested in (joint “effect” of language and task) as well as the relation between session and task are not directly modeled.

It may be at the first sight, but keep in mind that Session and Task are not easily estimated together since they are linearly dependent. You had to specify some strong priors to get this working given that you only have four observations per pair as far as I understand it. And then you wouldn’t gain much by modeling both I think.