Thanks for the tip, Paul!
My last post was vague because I was still unsure how to approach the problem, but I have made some progress since. I had been reading somewhere that standard truncation notation was not implemented for mixture models, but that is either outdated or untrue. As you say, once I implemented the _lcdf, the standard notation seems to work. Strangely, I did not need to implement the _lccdf to get it to run, but I have done so now. That code reads as follows and is based on formulae I got from wikipedia:
uniform_tones <- custom_family(
"uniform_tones",
type = "real",
vars = c('lb[n]', 'ub[n]')
)
stan_funs <- "
real uniform_tones_lpdf(real y, real mu, real lb, real ub) {
return -log(ub - lb);
}
real uniform_tones_lcdf(real y, real mu, real lb, real ub) {
if(y < lb)
{
return(-7); // Returning log(.001) to avoid log(0)
}
if(y >= ub)
{
return(0); // Returning log(1)
}
return log((y - lb) / (ub - lb));
}
real uniform_tones_lccdf(real y, real mu, real lb, real ub) {
if(y > ub)
{
return(-7); // Returning log(.001) to avoid log(0)
}
if(y <= lb)
{
return(0); // Returning log(1)
}
return log(1 - ((y - lb) / (ub - lb)));
}
"
Appended below is a csv file containing sample data and an R file containing some code depicting my current dilemma. This code fits a mixture model with a normal and uniform component. I attempt to model both theta (proportion of normal trials) and sigma (standard deviation of the normal component) as a function of several predictors (x1 through x4) – including one interaction (x1 * x2) and a random intercept (subject_number). The code is here:
full_model = bf(diff | trunc(lb=lb, ub=ub) ~ 1,
sigma1 ~ x1 * x2 + x3 + x4 + (1 | P | subject_number),
theta1 ~ x1 * x2 + x3 + x4 + (1 | P | subject_number),
mu1 ~ 0,
mu2 ~ 0)
m1c <- brm( full_model, data = dat, chains = 4,
iter = 100, warmup = 50, cores=4,
family=mix, stanvars = stanvars,
prior = m1.prior)
As indicated in my earlier post, the variable being modelled (diff) is a difference between a point on a target line and a point selected by the participant on that trial. Because we are using a line, the range of values a participant could select on a given trial depends on where the target point occurs (i.e., if it is near the lefthand side of the line, participants can’t undershoot by as much since they can’t go much farther left). In terms of modelling assumptions, it is assumed that the mean of the normal distribution is 0 (meaning that – on average – they select the correct point, with some unknown variance). I set the mean of the uniform to 0 also, but it is not used anywhere so I don’t think that matters. I am trying to use truncation to deal with the upper and lower boundaries but plan on playing with censoring as an alternative later on.
So, the problem I am running into is twofold. If I remove truncation, the model fits just fine. However, once I add in truncation – it runs into immediate problems finding starting values, throwing errors like:
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can't start sampling from this initial value.
Sometimes, it does find starting values that work for a given chain – sometimes even for all chains – but it then returns a number of divergent transitions equal to the number of post warm-up samples even with adapt_delta = .99. The problem emerges as I add more predictors and becomes particularly intractable once the random intercept is added. I must be messing up the implementation in some way that I can’t quite put my finger on how.
Any advice would be most appreciated!
dat.csv (383.5 KB)
sample_code.R (3.0 KB)
Operating System: MacOS 10.14
brms Version: 2.7.0