Rejecting initial value

akinslow2 · January 4, 2019, 7:21pm

I keep getting this in a loop:

SAMPLING FOR MODEL ‘8c349b2a7e185059d867bb3fe6b8746e’ NOW (CHAIN 1).
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lognormal_lpdf: Random variable[1] is -1.56634, but must be >= 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 49)

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: beta_lpdf: Second shape parameter[1] is 0, but must be > 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 58)

I ran:

priors ← c(set_prior(“lognormal(0.01494647, 0.00937701)”, class = “b”, coef = “Temp”))
ins_month ← brm(use2 ~ Temp + (1+Temp|Month.f),
data = dpa.ins.home, family = Beta(), warmup = 1000, iter = 3000, chains = 2,
inits = “random”, control = list(adapt_delta = .95, max_treedepth = 12),
prior = priors, cores = getOption(“mc.cores”, 4L), sample_prior = TRUE))

Can someone provide me with some advice to fix this…

Operating System: Windows 10
brms Version: 2.5

Jean_Billie · January 7, 2019, 2:20am

Please try changing the seed in the rstan::stan().

akinslow2 · January 7, 2019, 4:58pm

I don’t really understand init or seed in the rstan::stan() so I don’t know what to change either to. Could you explain these two arguments or provide a reference that you feel explains them well.

ScottAlder · January 7, 2019, 7:47pm

init refers to the initial parameter values that Stan starts HMC/NUTS off with. If these values don’t produce a finite log posterior, then Stan has to choose new initial values until it gets a good starting point with a finite log posterior. If no initializer function or list is specified for rstan::stan(), then Stan will sample uniformly on the unconstrained support from the interval (-2, 2) using pseudo RNG. Pseudo RNG is deterministic in the sense that if you provide the same seed, you will get the same sample. So changing the seed will yeild a different set of initial values that may or may not have a finite log posterior. See pg 42 of the rstan manual for more details.

Another approach would be specifying a list or function that returns initial parameter values that you know will yield a finite log posterior. If you do this you should use different initial values for each Markov Chain to get more robust diagnostics.

realkrantz · January 7, 2019, 10:23pm

Let’s suppose that a0, a1, a2, and a3 are my model parameters, and my stan model is:

fit ← stan(fit = model.stan, iter = 2000, init = …, …)

and the initial values I want to use in each of my 4 chains are:

inits_1 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}

inits_2 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}

inits_3 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}

inits_4 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}

could you kindly clarify how could the below be done in practice for this model and these inits?

Thanks in advance.

ScottAlder · January 7, 2019, 10:50pm

I would do something like this, but with some RNG noise added to each chain initialization (I usually initialize by sampling from the parameter priors in R).

initializer <- function() list("a0"=-4,"a1"=0,"a2"=0, "a3"=0)
n_chains <- 4

inits <- list()
for (chain in 1:n_chains) inits[[chain]] <- initializer()
draw <- stan(..., init=inits)

realkrantz · January 7, 2019, 10:56pm

That’s great. Thanks @ScottAlder. EDIT: just tested. It reduced notably the number of Error evaluating the log probability at the initial value messages compared to the other approach. Thank you very much for this very helpful insight.

Just to make sure that I am doing it right - it might be that I am not doing it right.

Is it:

or

?

ScottAlder · January 8, 2019, 4:45am

The first one would ensure that the list of initial values and the number of chains matched, so probably that one is best. Otherwise, running on the default number of chains (4) won’t break it

realkrantz · January 8, 2019, 7:22am

Great! Thanks @ScottAlder.

akinslow2 · February 1, 2019, 7:01pm

First thank you for these explanations it was very helpful. Could you provide how you would code the different initial values for each Markov Chain?

ScottAlder · February 1, 2019, 8:11pm

Here’s how I sample from truncated priors for lower bounded parameters to initialize HMC. The bounds are

a_1 \in (0, \infty)
a_2 \in (-a_1, \infty)
a_3 \in(-\infty, \infty)

which in the Stan parameters block looks like:

real <lower=0.0> a1;
real <lower=-a1> a2;
real a3;

In R:

sample_priors <- function() {
 # Initialize each parameter less than or equal to their lower bound
 sample <- 
    list("a1" = 0,    
         "a2" = -Inf, 
         "a3" = -Inf, 
    )
  
  while (!sample$a1  > 0)
    sample$a1  <- rnorm(1, 5.0, 1.0) # a1 ~ normal(5.00, 1.0) T[0,];
  
  while (!sample$a2 > -sample$a2)
    sample$a2 <- rnorm(1, 1.25, 1.0) # a2 ~ normal(1.25, 1.0) T[-a1,];

  sample$a3 <- rnorm(1, 40.0, 1.5)   # a3 ~ normal(40.0, 1.5);
 
  return (sample)  
}

N_chains <- 4
init_values <- vector("list", N_chains)
for (n in 1:N_chains) init_values[[n]] <- sample_priors()

model_fit <- 
    sampling(
      stan_model,
      init = init_values,
      chains = N_chains
    )

Notice the while statements which are used to resample if rnorm() returns a value out of the truncation bounds.

ScottAlder · February 1, 2019, 9:03pm

Also, I should try to explain why it is that you should start each Markov chain at different initial values. Assume you have a multimodal posterior; its better to find out sooner rather than later that it is multimodal. There’s a chance that your Markov chains will settle down in separate modes, which will cause the R_hat statistic to be much greater than 1, which tells you that the posterior is multimodal. This will also result in a traceplot like this (source):

Starting all of your Markov chains at the same initial values makes it more likely that, if the posterior is multimodal, the chains will all settle down into the same mode and you won’t find out until you run your model for the n’th time and accidentally find a new mode. Dispersed initial values increase the chance of finding separate modes, if there are any.

akinslow2 · February 1, 2019, 11:32pm

That actually is very interesting and helpful because I was concerned that my response variable distribution was multimodal and was looking into how to test that.

Topic		Replies	Views
Beta regression - help with error evaluating the log probability at the initial value Modeling	6	1667	April 24, 2025
Brms: cumulative model\ rejecting initial values issue Modeling	10	718	February 10, 2023
Brms rejecting values brms rstan , irt	3	940	July 24, 2020
Beta binomial regression rejecting initial values brms fitting-issues , brms	5	126	April 16, 2025
Error evaluating the log probability at the initial value brms	2	1439	November 15, 2019

Rejecting initial value

Related topics