Rejecting initial value using non-logit link for cumulative ordinal models

I’m exploring different link functions for cumulative ordinal models in brms. The model runs fine using the default “logit” link, but when I use a “probit”, “probit_approx”, or “cloglog” link I get the error:

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

The underlying Stan code is identical except for the link functions and it seems like an initial value that evaluates to -Inf under a probit or cloglog link would also evaluate to -Inf using the logit link. Below is a toy example trying the same cumulative ordinal model with 4 different link functions. Any ideas on what is causing the error or how to fix it? (Setting inits=“0” sometimes resolves the issue for the probit model, but not the probit_approx or cloglog models.)

set-up data

library(dplyr)
library(brms)

generate.data ← function(seed=1, n=200, p=0.5, alpha=0, beta=c(1, -0.5), sigma=1){
set.seed(seed)
z1 ← sample(c(0,1), size=n, replace=TRUE, prob=c(1-p, p))
z2 ← rnorm(n, 0, 1)
y ← rnorm(n, alpha+beta[1]*z1 + beta[2]*z2, sigma)
data ← data.frame(y=y, z1=z1, z2=z2)
return(data)
}

dat10levs ← generate.data(seed=45232,n=100) %>% mutate(y=cut_number(y,10))

model using logit link

fit_logit ← brm(as.numeric(y) ~ z1+z2, data=dat10levs, family=cumulative(“logit”),
chains=1, iter=1000, seed=5, refresh=0)
Compiling the C++ model
Start sampling

Gradient evaluation took 8.5e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.85 seconds.
Adjust your expectations accordingly!

Elapsed Time: 0.355146 seconds (Warm-up)
0.306718 seconds (Sampling)
0.661864 seconds (Total)

model using probit link

fit_probit ← brm(as.numeric(y) ~ z1+z2, data=dat10levs, family=cumulative(“probit”),
chains=1, iter=1000, seed=5, refresh=0)
Compiling the C++ model
Start sampling

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

Gradient evaluation took 0.000419 seconds
1000 transitions using 10 leapfrog steps per transition would take 4.19 seconds.
Adjust your expectations accordingly!

Elapsed Time: 1.7464 seconds (Warm-up)
1.67224 seconds (Sampling)
3.41865 seconds (Total)

model using approximate probit link

fit_probit_app ← brm(as.numeric(y) ~ z1+z2, data=dat10levs, family=cumulative(“probit_approx”),
chains=1, iter=1000, seed=5, refresh=0)
Compiling the C++ model
Start sampling

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

… (error repeats ~20 times)

Gradient evaluation took 0.000234 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.34 seconds.
Adjust your expectations accordingly!

Elapsed Time: 0.928907 seconds (Warm-up)
0.872934 seconds (Sampling)
1.80184 seconds (Total)

model using complementary log-log link

fit_cloglog ← brm(as.numeric(y) ~ z1+z2, data=dat10levs, family=cumulative(“cloglog”),
chains=1, iter=1000, seed=5, refresh=0)
Compiling the C++ model
Start sampling

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

… (error repeats until 100 attempts)

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
error occurred during calling the sampler; sampling not done

  • Operating System: Linux (Ubuntu 16.04)
  • brms Version: 2.3.0
2 Likes

Try specifying init_r to some value less than 2.

I think the default priors are flat in brms, so that could be an issue too. Manually specifying at least a weakly informative prior is usually a good idea.

Agreed to what both @bgoodri and @jonah said. This may happen if the initial values are too large so that the link-function over / underflows. The cloglog link is particular vulnerable to this. It is not problematic if the sampler needs a few attempts to start sampling. The only problem is when it isn’t able to start at all, which only happens for you with the cloglog link. It seems the cloglog link has problems with the cumulative() family when using many response categories. When switching to another ordinal family, sratio say, everything went well for me.

Thanks for the feedback. For the probit model, specifying init_r to a smaller value or using inits="0" does help. The intercepts have a weakly informative default prior (student_t(3,0,10)) but I didn’t notice any major differences if I also manually specified a weakly informative prior for the other parameters.

My ultimate goal is to examine the use of cumulative probability models for continuous outcomes, so the number of ‘response categories’ is the same as the number of observations (assuming no ties). The logit models work fine in this case, but I may have to switch to a different ordinal family for the other links as @paul.buerkner suggested

What is the purpose of using an ordinal model for continuos data? This seems rather strange to me at first glance.

I agree the idea seems a little odd at first glance. The main reasons we are looking at ordinal models for continuous data are:

  1. They are invariant to monotonic transformations of outcomes so can handle skewed distributions where different transformations may give different results
  2. The full conditional CDF can be modelled so estimates of means and quantiles can be derived from a single model
  3. Since any ordered response can be used they can handle mixed discrete/continuous distributions such as an outcome with a lower limit of detection.

More details using the classical/frequentist paradigm are in Liu et al. (2017) and Harrell’s Regression Modeling Strategies, 2nd ed. I’m exploring the same models in a Bayesian framework.

I see this a lot with categorical approaches to continuous data. That’s all over the Gelman and Hill regression book. And often it’s ordinal, but it’s not being modeled as ordinal, as in incomes, or education levels, which might have five or eight clearly ordered bins, but not be modeled as ordinal.