Rejecting initial value


#1

I keep getting this in a loop:

SAMPLING FOR MODEL ‘8c349b2a7e185059d867bb3fe6b8746e’ NOW (CHAIN 1).
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lognormal_lpdf: Random variable[1] is -1.56634, but must be >= 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 49)

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: beta_lpdf: Second shape parameter[1] is 0, but must be > 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 58)

I ran:

priors <- c(set_prior(“lognormal(0.01494647, 0.00937701)”, class = “b”, coef = “Temp”))
ins_month <- brm(use2 ~ Temp + (1+Temp|Month.f),
data = dpa.ins.home, family = Beta(), warmup = 1000, iter = 3000, chains = 2,
inits = “random”, control = list(adapt_delta = .95, max_treedepth = 12),
prior = priors, cores = getOption(“mc.cores”, 4L), sample_prior = TRUE))

Can someone provide me with some advice to fix this…

  • Operating System: Windows 10
  • brms Version: 2.5

#2

Please try changing the seed in the rstan::stan().


#3

I don’t really understand init or seed in the rstan::stan() so I don’t know what to change either to. Could you explain these two arguments or provide a reference that you feel explains them well.


#4

init refers to the initial parameter values that Stan starts HMC/NUTS off with. If these values don’t produce a finite log posterior, then Stan has to choose new initial values until it gets a good starting point with a finite log posterior. If no initializer function or list is specified for rstan::stan(), then Stan will sample uniformly on the unconstrained support from the interval (-2, 2) using pseudo RNG. Pseudo RNG is deterministic in the sense that if you provide the same seed, you will get the same sample. So changing the seed will yeild a different set of initial values that may or may not have a finite log posterior. See pg 42 of the rstan manual for more details.

Another approach would be specifying a list or function that returns initial parameter values that you know will yield a finite log posterior. If you do this you should use different initial values for each Markov Chain to get more robust diagnostics.


#5

Let’s suppose that a0, a1, a2, and a3 are my model parameters, and my stan model is:

fit <- stan(fit = model.stan, iter = 2000, init = …, …)

and the initial values I want to use in each of my 4 chains are:

inits_1 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}
inits_2 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}
inits_3 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}
inits_4 <- function(){list(a0=-4,a1=0,a2=0, a3=0)}

could you kindly clarify how could the below be done in practice for this model and these inits?

Thanks in advance.


#6

I would do something like this, but with some RNG noise added to each chain initialization (I usually initialize by sampling from the parameter priors in R).

initializer <- function() list("a0"=-4,"a1"=0,"a2"=0, "a3"=0)
n_chains <- 4

inits <- list()
for (chain in 1:n_chains) inits[[chain]] <- initializer()
draw <- stan(..., init=inits)


#7

That’s great. Thanks @ScottAlder. EDIT: just tested. It reduced notably the number of Error evaluating the log probability at the initial value messages compared to the other approach. Thank you very much for this very helpful insight.

Just to make sure that I am doing it right - it might be that I am not doing it right.

Is it:

or

or

?


#8

The first one would ensure that the list of initial values and the number of chains matched, so probably that one is best. Otherwise, running on the default number of chains (4) won’t break it


#9

Great! Thanks @ScottAlder.


#10

First thank you for these explanations it was very helpful. Could you provide how you would code the different initial values for each Markov Chain?


#11

Here’s how I sample from truncated priors for lower bounded parameters to initialize HMC. The bounds are

a_1 \in (0, \infty)
a_2 \in (-a_1, \infty)
a_3 \in(-\infty, \infty)

which in the Stan parameters block looks like:

real <lower=0.0> a1;
real <lower=-a1> a2;
real a3;

In R:

sample_priors <- function() {
 # Initialize each parameter less than or equal to their lower bound
 sample <- 
    list("a1" = 0,    
         "a2" = -Inf, 
         "a3" = -Inf, 
    )
  
  while (!sample$a1  > 0)
    sample$a1  <- rnorm(1, 5.0, 1.0) # a1 ~ normal(5.00, 1.0) T[0,];
  
  while (!sample$a2 > -sample$a2)
    sample$a2 <- rnorm(1, 1.25, 1.0) # a2 ~ normal(1.25, 1.0) T[-a1,];

  sample$a3 <- rnorm(1, 40.0, 1.5)   # a3 ~ normal(40.0, 1.5);
 
  return (sample)  
}

N_chains <- 4
init_values <- vector("list", N_chains)
for (n in 1:N_chains) init_values[[n]] <- sample_priors()

model_fit <- 
    sampling(
      stan_model,
      init = init_values,
      chains = N_chains
    ) 

Notice the while statements which are used to resample if rnorm() returns a value out of the truncation bounds.


#12

Also, I should try to explain why it is that you should start each Markov chain at different initial values. Assume you have a multimodal posterior; its better to find out sooner rather than later that it is multimodal. There’s a chance that your Markov chains will settle down in separate modes, which will cause the R_hat statistic to be much greater than 1, which tells you that the posterior is multimodal. This will also result in a traceplot like this (source):

Starting all of your Markov chains at the same initial values makes it more likely that, if the posterior is multimodal, the chains will all settle down into the same mode and you won’t find out until you run your model for the n’th time and accidentally find a new mode. Dispersed initial values increase the chance of finding separate modes, if there are any.


#13

That actually is very interesting and helpful because I was concerned that my response variable distribution was multimodal and was looking into how to test that.