Divergence and Bad Start Values in Censored Mixture Model

I am trying to fit a complex, multilevel mixture model that is described in greater detail here. The gist of it is that I am having participants listen to tones and then try to recreate the original tone using a slider. My dependent measure is a transformation of the distance on the slider between the true tone and the tone selected, in semi-tones (which involves a non-linear transformation). I am modelling this difference as coming from a mixture of a uniform distribution (they have no idea what the tone was) or a normal distribution with unknown sigma (they have a representation of the tone but it may be imprecise). I can fit that model fine. However, one element I am trying to incorporate now is the fact that responses are liable to be “cut-off” on either end (since the slider is of finite length). I conceptualized this as truncation in my earlier post but I am fairly certain it is instead an example of censoring. The model therefore becomes a mixture between a uniform distribution where the upper and lower bounds vary across observations and a censored normal distribution where responses close to either end are considered to be censored.

I started off trying to fit this using brms, which has taken me pretty far. I have recently switched to using base Stan – working with the code generated by brms – which has permitted a few optimizations. The issue is that the model seems to have trouble finding initial start values as evidenced by the following error:

Rejecting initial value:
  Gradient evaluated at the initial value is not finite.
  Stan can't start sampling from this initial value.

In the full model I hope to fit, the chains almost never find appropriate values. By stripping it down to a much simpler model (appended with data) I have been able to get it to fit most of the time, but it usually has divergent transitions roughly equivalent to the number of post warm-up samples. This pathological behaviour intensifies if predictors or random effects are incorporated.

I feel I must be doing something fundamentally wrong here, but I am not quite sure what that is. Any advice would be appreciated. Code necessary to read the data and fit the model provided here:

# Libraries

# Read in data and setup censoring variable
dat = read.csv('dat2.csv')

samp_mod_compiled = stan_model('samp_mod.stan')

samp_mod_dat = list(N = nrow(dat), Y = dat$diff, cens=dat$cens, lb=dat$lb, ub=dat$ub, prior_only=0)

samp_mod_samples = sampling(samp_mod_compiled, data=samp_mod_dat, chains=4, cores=4, iter=2000, warmup=1000)

The Stan code was generated using brms and edited by me to remove certain extraneous elements.

Operating System: MacOS 10.14
rstan Version: 2.18.2
brms Version: 2.7.0
dat2.csv (318.0 KB)
samp_mod.stan (1.7 KB)

1 Like

One thing that is weird is that you separately have theta1 and theta2, but I believe you should always have exp(theta2) = 1 - exp(theta1). Further you initialize theta2 with zeroes and then subtract from it? I would suggest you directly compute a single theta and use log_mix to compute your mixture.

If that doesn’t help I’ve also compiled a list of strategies that can be used to investigate further https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

1 Like

Thanks for the tip!
I also enjoyed your blog!

The parameterization for theta is from brms (which was used to generate the code prior to my modifications). It is not how I tend to program mixture models, but I am trying to stay relatively close to brms output in hopes I can generate my models using that package and quickly modify it to my purpose. Nonetheless, I will see if changing that element of the parameterization solves the issue!