Cannot sample from simple non-linear model due to a non-finite location parameter

Hello, I have a simple univariable non-linear model with an exponent applied to the observed data. The model fails to fit with the following error:

Chain 16 Rejecting initial value:
Chain 16   Error evaluating the log probability at the initial value.
Chain 16 Exception: normal_lpdf: Location parameter[1] is nan, but must be finite! (in '/var/folders/v5/hz8q76_n1g1b7v_bmyfcp1qw0000gn/T/RtmpB3NTJV/model-f277aef2b02.stan', line 42, column 4 to column 42)

The model is written in brms:

brm(bf(log(NTDratio) ~ a + b1 * Age ^ b2, b1 + b2 + a ~ 1, nl = TRUE), family = gaussian(), Data %>% mutate(Age = Age - mean(Age)),  backend = 'cmdstan', chains = 16, cores = 8, silent = 2, refresh = 0, iter = 8000,
		normalize = F, control = list(adapt_delta = .95, max_treedepth = 15),
prior = c(
    prior(student_t(3, 0, 1), class = 'b', nlpar = 'a'),
    prior(student_t(3, 0, 1), class = 'b', nlpar = 'b1'),
    prior(normal(0, 3), class = 'b', nlpar = 'b2', lb = 1)
))

and this is the related stan code:

data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K_b1;  // number of population-level effects
  matrix[N, K_b1] X_b1;  // population-level design matrix
  int<lower=1> K_b2;  // number of population-level effects
  matrix[N, K_b2] X_b2;  // population-level design matrix
  int<lower=1> K_a;  // number of population-level effects
  matrix[N, K_a] X_a;  // population-level design matrix
  // covariate vectors for non-linear functions
  vector[N] C_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K_b1] b_b1;  // population-level effects
  vector<lower=1>[K_b2] b_b2;  // population-level effects
  vector[K_a] b_a;  // population-level effects
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_b1 = X_b1 * b_b1;
    // initialize linear predictor term
    vector[N] nlp_b2 = X_b2 * b_b2;
    // initialize linear predictor term
    vector[N] nlp_a = X_a * b_a;
    // initialize non-linear predictor term
    vector[N] mu;
    for (n in 1:N) {
      // compute non-linear predictor values
      mu[n] = nlp_a[n] + nlp_b1[n] * C_1[n] ^ nlp_b2[n];
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += student_t_lpdf(b_b1 | 3, 0, 1);
  target += normal_lpdf(b_b2 | 0, 3)
    - 1 * normal_lccdf(1 | 0, 3);
  target += student_t_lpdf(b_a | 3, 0, 1);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}

By my understanding, all parameters should be supported in the real line so I don’t understand which combination of them is creating problems. Of note, the dataset is very small (20 obs).

Are there any ‘inf’ or ‘NaN’ values in the data set?

You could also try setting some initial values that you know will yield a valid value for location.

I tried to set up the inits via: inits = function() list(a = 0, b1 = 0, b2 = 1) but still has problems.
Here’s the data as seen by stan (after the transformations in the brm call):

$Y
 [1] -1.0185696 -0.2795849 -1.2452158 -0.6466272 -0.7884574 -0.4780358
 [7] -3.7196511 -1.6094379 -0.4635727 -4.3248682 -1.5404450 -0.3844117
[13] -0.1880522 -0.1758907 -1.0560527 -3.1964030 -0.9315582 -0.8472979
[19] -0.9864950 -1.0006319

$C_1
 [1]  -5.45   0.55   5.55 -22.45   7.55 -12.45  13.55   7.55   7.55  12.55
[11] -18.45   9.55   3.55  -7.45  -2.45   6.55  10.55  -9.45 -10.45   3.55

I don’t see any problematic combination of parameters…

I found that when parameters are vectors I have had to set them as arrays

init_list <- list(
        list(b_b1 = array(data = 1, dim = K_b1),
             b_b2 = array(data =  1, dim = K_b2),
             b_a = array(data = 1, dim = K_a)))

Nope, it gives me the error: “Error in array(data = 1, dim = K_b1) : object ‘K_b1’ not found”

And if I change it to:

 list(
    list(b_b1 = array(data = 1, dim = 'b1'),
         b_b2 = array(data =  1, dim = 'b2'),
         b_a = array(data = 1, dim = 'a')))

then I get:

Error in array(data = 1, dim = "b1") : 
  negative length vectors are not allowed

You’ll need to pass the length of K_b* as the dimensions

Another error. Pasting the whole call for completeness:

model <- brm_call(bf(log(NTDratio) ~ a + b1 *Age ^ b2, b1 + b2 + a ~ 1, nl = TRUE), family = gaussian(), Data %>% mutate(Age = Age - mean(Age)), prior = c(
    prior(student_t(3, 0, 1), class = 'b', nlpar = 'a'),
    prior(student_t(3, 0, 1), class = 'b', nlpar = 'b1'),
    prior(normal(0, 3), class = 'b', nlpar = 'b2')
), inits = list(
    list(b_b1 = array(data = 0, dim = 1),
         b_b2 = array(data =  1, dim = 1),
         b_a = array(data = 0, dim = 1))))

 Error: 'init' has the wrong length. See documentation of 'init' argument.

If they are all vectors of length 1, you can omit the dim argument

same error

If you can share some data to reproduce the problem, I can try to debug locally

I just noticed you’re running 16 chains. Any specific reason why? You’ll have to set initial values for each chain.

Will this work?

set_inits  <-  function(seed = 1) {
  set.seed(seed)
  list(
    b_b1 = array(rnorm(n = 1, mean = 1, sd = 0.05)),
    b_b2= array(rnorm(n = 1, mean = 1, sd = 0.05)),
    b_a = array(rnorm(n = 1, mean =1, sd = 0.05))
}

init_list <- list(
  set_inits(seed = 1),
  set_inits(seed = 2),
  set_inits(seed = 3),
 set_inits(seed = 4),
 set_inits(seed = 5,
 set_inits(seed = 6),
 set_inits(seed = 7),
 set_inits(seed = 8),
 set_inits(seed = 9),
 set_inits(seed = 10),
 set_inits(seed = 11),
 set_inits(seed = 12),
 set_inits(seed = 13),
 set_inits(seed = 14),
 set_inits(seed = 15),
 set_inits(seed = 16)
)

This approach is from @Solomon in his post here.

1 Like

Here it is. the Y is already logged and the X is centered:

list(Y = c(-1.01856958099457, -0.279584862219161, -1.24521576285998,
-0.646627164925052, -0.78845736036427, -0.478035800943, -3.71965111278069,
-1.6094379124341, -0.463572738915445, -4.32486822083393, -1.54044504094715,
-0.384411698910332, -0.18805223150294, -0.175890666463664, -1.05605267424931,
-3.19640296901614, -0.931558204004943, -0.847297860387204, -0.986494990547404,
-1.00063188030791), X = structure(c(-5.45, 0.549999999999997,
5.55, -22.45, 7.55, -12.45, 13.55, 7.55, 7.55, 12.55, -18.45,
9.55, 3.55, -7.45, -2.45, 6.55, 10.55, -9.45, -10.45, 3.55), .Dim = c(20L,
1L), “scaled:center” = 66.45))