Hierarchical mixture of ordered probits - identifiability problems with subject-specific mixing proportions

jww828 · May 7, 2025, 10:35pm

Hello! I’m in need of some help.

Using Stan, I’m trying to implement the simplified version of the Signal Detection Theory (SDT) mixture model described on pg. 711 of DeCarlo (2002):

P(Y \leq k \mid X) = \theta \cdot \Phi(c_k - d_A X) + (1 - \theta) \cdot \Phi(c_k)

Applied to recognition (confidence rating) data, this model assumes that there are two latent classes of trials for “old” items (which could be mapped to, e.g., attended trials and unattended trials) and attempts to estimate the proportion of trials in each class (theta) as well as sensitivity (d’) and thresholds (c). I started off with a non-hierarchical version of the model to try and ensure I translated the likelihood correctly. That version worked and did a good job recovering true parameter values from simulated data, so I have been incrementally adding to that model. Currently, the model has subject-level random intercepts on each of the three parameters (i.e., d’, c, and theta). Here is the version of the model I am testing:

data {
  int<lower=1> N;
  int<lower=1> N_subj;
  array[N] int<lower=1,upper=6> Y;
  array[N] int<lower=1,upper=N_subj> subj;
  vector[N] stimulus;
  int<lower=2> nthres;
}

parameters {
  // d prime
  real d;
  
  // theta
  real theta_logit;

  // Thresholds
  ordered[nthres] mu_c;

  vector[N_subj] d_subj_z;
  vector[N_subj] theta_subj_z;
  array[N_subj] vector[nthres] c_z;

  real<lower=0> sigma_d;
  real<lower=0> sigma_theta;
  vector<lower=0>[nthres] sigma_c;
}

transformed parameters {
  vector[N_subj] d_subj = sigma_d * d_subj_z;
  vector[N_subj] theta_subj = sigma_theta * theta_subj_z;
  array[N_subj] ordered[nthres] c_subj;

  for (s in 1:N_subj) {
    c_subj[s] = mu_c + sigma_c .* c_z[s];

    for (k in 2:nthres) {
      // to prevent thresholds from collapsing
      c_subj[s, k] = fmax(c_subj[s, k], c_subj[s, k - 1] + 0.01);
    }
  }
}

model {
  d ~ normal(1, 1);
  theta_logit ~ normal(0, 1);
  mu_c ~ normal(0, 1.5);
  sigma_d ~ normal(0, 0.5);
  sigma_theta ~ normal(0, 0.5);
  sigma_c ~ normal(0, 0.5);

  d_subj_z ~ std_normal();
  theta_subj_z ~ std_normal();
  for (s in 1:N_subj) {
    c_z[s] ~ std_normal();
  }

  for (n in 1:N) {
    int s = subj[n];
    real theta = inv_logit(theta_logit + theta_subj[s]);
    real curr_d = d + d_subj[s];

    target += log_mix(
      theta,
      ordered_probit_lpmf(Y[n] | stimulus[n] * curr_d, c_subj[s]),
      ordered_probit_lpmf(Y[n] | 0, c_subj[s])
    );
  }
}

This version of the model only sort of works. The model runs, does not diverge, and even still does a good job with parameter recovery - except for sigma_theta. This parameter seems to be very poorly identified - the estimates are pretty far off of simulated true values, the CIs are incredibly wide, ESS and RHats are abysmal, etc. Of course, the individual-level theta_subj parameters are similarly poorly identified, too. Additionally, including this parameter in the model seems to hinder sampling efficiency and recovery of the other parameters. The model works fairly well when I include my other hierarchical effects but exclude subject-level mixing proportions. However, If I fit such a model to data simulated with subject-level variance in theta, the population level estimate of theta is very severely biased. In the real world, I think it’s reasonable to expect that mixture proportions would vary systematically at the subject level (e.g., people probably vary in how much attention they’re paying), so I don’t think this assumption could be safely ignored for real data.

Thus far, I have not been able to find a way to improve the estimation of subject-level theta parameters. I’m aware that a model like this poses substantial identifiability challenges - it’s difficult for the model to separate d’ and theta because of the tradeoffs between those parameters that can be made. I’m thinking that perhaps this model is simply doomed because of how difficult it is to learn subject-level mixing proportions from the data. However, I’m also wondering if there’s anything I should try before throwing in the towel - alternative parameterizations, constraints that could hypothetically be defensible in the real world, etc.? Or perhaps, am I making mistakes somewhere that I’m not aware of? I’m totally unsure - if anyone has any ideas at all for me, I’d love to know! Here is R code for a reproducible example, if anyone wants to take it that far. Note that this model is very, very slow to run.

# design parameters

N_subj <- 100
N_trials <- 100  
nthres <- 5 

# true thresholds and variance
true_mu_c <- c(-1.2, -0.6, 0.5, 1.0, 1.5)
true_sigma_c <- c(0.35,0.4,0.6,0.5,0.28) 

# true d' and variance
true_d <- 1.2
true_sigma_d <- 0.4

# true theta and variance
true_theta_logit <- 0.5 
true_sigma_theta <- 0.4

sim_data <- list()
for (s in 1:N_subj) {
  subj_d <- rnorm(1, 0, true_sigma_d)
  subj_theta <- rnorm(1, 0, true_sigma_theta)
  subj_c <- rnorm(nthres, 0, true_sigma_c)
  
  c_subj <- true_mu_c + subj_c
  for (k in 2:nthres) {
    c_subj[k] <- max(c_subj[k], c_subj[k-1] + 0.01)
  }
  
  for (t in 1:N_trials) {
    stimulus <- ifelse(t <= N_trials/2, 0.5, -0.5)

    curr_d <- true_d + subj_d
    curr_theta <- plogis(true_theta_logit + subj_theta)
    
    if (runif(1) < curr_theta) {
      latent <- rnorm(1, mean = stimulus * curr_d, sd = 1)
      Y <- sum(latent > c_subj) + 1
    } else {
      latent <- rnorm(1, mean = 0, sd = 1)
      Y <- sum(latent > c_subj) + 1
    }
    
    sim_data[[(s-1)*N_trials + t]] <- data.frame(
      subj = s,
      Y = Y,
      stimulus = stimulus
    )
  }
}

stan_data <- list(
  N = N_subj * N_trials,
  N_subj = N_subj,
  Y = sapply(sim_data, function(x) x$Y),
  subj = sapply(sim_data, function(x) x$subj),
  stimulus = sapply(sim_data, function(x) x$stimulus),
  nthres = nthres
)

mix_basic <- cmdstan_model('sdt-mix-basic.stan')

fit_mix <- mix_basic$sample(
  stan_data,
  num_cores = 12,
  num_chains = 12,
  iter_warmup = 1000,
  iter_sampling = 1000)

Corey.Plate · May 7, 2025, 11:22pm

With sigma_theta and theta_subj_z appearing only in a product of Gaussians, how would the model ever differentiate between their contributions? Since theta_subj_z is not independently constrained by subject level data, the entire space of sigma_theta*theta_subj_z is an arbitrary tradeoff, no? I can raise sigma_theta by a constant if I divide all of the theta_subj_z by the same constant and vice versa. When you have degeneracy in the model like this it’s going to cause identifiability problems, yes, but also the broad sampling problems that you noted.

Topic		Replies	Views
Hierarchical Linear Mixture Model Modeling	1	630	February 4, 2019
Bayesian Hierarchical Two-Stage Process Model with Nested Conditions and Ragged Data Structure (Solved) Modeling techniques , fitting-issues , specification , performance , paralellization	0	399	December 8, 2023
Parameter recovery issues with normal mixture model Modeling hierarchical-model , mixture	3	635	June 16, 2022
Model specification for hierarchical AR(1) and Ornstein-Uhlenbeck process Modeling hierarchical-model	14	4783	December 25, 2020
Mixture modeling RStan	1	491	July 2, 2021

Hierarchical mixture of ordered probits - identifiability problems with subject-specific mixing proportions

Related topics