Model comparison for choice in per-subject latent-state models with additional measurement channels (LOO vs two-stage approach)

Hi everyone,

I’m fitting per-subject Bayesian latent-state models in Stan for a gambling task (153 trials per participant). Each participant is fit independently.

At each trial, a latent emotional state Et evolves according to past choices and outcomes (RW-style update with linear drift). This latent state:

  • predicts choice at trial t,

  • is linked to sparse self-report emotion ratings, and

  • in an extended model, is also linked to an additional noisy observation channel derived from computer-vision-based affect decoding (Morph).

Schematically:

\begin{align} E_{t+1} &= E_t + \alpha_{\text{gain}} \cdot \text{outcome}_t + \alpha_{\text{drift}} \\ \\ \text{choice}_t &\sim \text{Bernoulli}\!\left( \operatorname{logit}^{-1}\!\left( \boldsymbol{\beta_X} \mathbf{X}_t + \beta_E E_t \right) \right) \\ \\ \text{rating}_t &\sim \mathcal{N}\!\left(E_t,\; \sigma_{\text{rating}}\right) \\ \\ \text{morph}_t &\sim \mathcal{N}\!\left(E_t,\; \sigma_{\text{morph}}\right) \end{align}

I’m comparing models that differ only in the measurement model of Et

  1. Choice + ratings

  2. Choice + ratings + computer-vision-based affect

The choice model (betas) is identical across models; differences are entirely in the additional observation channel for the latent state.

My question is specifically about choice prediction

Does adding the MorphCast-derived affective signal improve prediction of choices, via better inference of the latent emotional state?

I would like to compare the 2 models but focusing only on the choice component.

I am aware that comparing in-sample choice log-likelihoods maybe is problematic here, because adding extra measurement channels (and parameters) can improve latent-state inference and thus indirectly improve in-sample choice fit (although the choice model has the same number of parameters).

To address this, I am considering two approaches and would appreciate guidance on best practice.

Approach 1: PSIS-LOO on choice likelihood (joint models)

Models are fit per subject, so my plan is to:

  • compute PSIS-LOO per participant using only the choice likelihood,

  • sum ELPD across participants to compare models.

My understanding is that this is valid because participants are independent and ELPDs add.

Approach 2: Two-stage model (emotion then choice)

As a secondary / diagnostic approach, I am also considering:

  1. Fitting an emotion-only model (ratings ± MorphCast) to infer E[t]

  2. Extracting posterior draws of the latent state from generated quantities,

  3. Using the inferred pre-choice state E[t] as a predictor in a separate choice-only Stan model,

  4. Comparing choice predictive performance across models.

I am aware that:

  • naive plug-in of posterior means ignores uncertainty,

  • propagating uncertainty (e.g., via multiple draws or a measurement-error formulation) would be preferable,

  • and this two-stage approach answers a slightly different question than the fully joint model.

Questions

  1. Is PSIS-LOO on loglikelihood of choice only, aggregated across per-subject fits, the recommended way to compare these models with respect to choice prediction?

  2. Is summing per-subject ELPDs the correct aggregation strategy in this setting?

  3. Does the two-stage emotion then choice approach make sense as a secondary analysis?

Thanks very much for any advice. I mainly want to make sure that the model comparison I report is statistically defensible and aligned with best practice.

This will only give you the same result for participants if the priors are constant and every parameter is participant-level. Then you can add the ELPDs as it would give you the same result as if they were together. You can actually verify this with simulation to convince yourself.

When computing LOO, you have the choice of leaving out observations within groups (here participants, I think) versus leaving out entire groups. It depends on the prediction problem—are you trying to model the next participant, or the next observation from an existing participant?

Is the computation too much to model everything jointly?

If you can’t model everything jointly, then multiple imputation is usually a better approach than plugging in point estimates.

1 Like

Thank you very much
This is extremely helpful and clears up my concerns.

In my current setup, each participant is fit independently, with identical priors across participants and no shared or group-level parameters in the fitted models. All parameters are participant-specific, and participants are treated as conditionally independent. Given this, summing per-participant ELPDs from trial-level PSIS-LOO corresponds to the joint ELPD and matches my prediction target, which is within-participant prediction of new trials rather than generalization to new participants.

My primary goal at this stage is to compare models with respect to choice prediction and then extract participant-level parameters to relate them to psychiatric trait measures. A hierarchical joint model could be a natural next step for the “winning” model, but may be computationally expensive given the current model complexity.

Regarding the two-stage approach, I agree that propagating uncertainty via multiple imputation is preferable to plugging in point estimates. For now, I am focusing on PSIS-LOO from the joint model, and would consider a multiple-imputation-based sensitivity analysis if needed.

Thanks again for taking the time to clarify these points!