Vector bounds and sampling error


I’m trying to estimate some parameters from the data, but I came across a bug that I don’t know how to deal with.
This piece of code alone may look stupid (especially, using this multiplier), but it’s an isolated part of a more complicated model, that has the same issue. For some reason, estimation of the transformed variable R fails because of the lower bound I set to it. R values shouldn’t get as low as 0, but in case they happen to be negative, I want to discard them by setting this lower bound.

R_data = c(0.22, 0.23, 0.26, 0.27, 0.3, 0.34, 0.13, 0.14, 0.21, 0.23, 
           0.29, 0.31, 0.18, 0.19, 0.27, 0.27, 0.33, 0.36, 0.31, 0.35, 0.38, 
           0.39, 0.41, 0.48)

N = length(R_data)

modelstring = "
data {
  int N;
  vector[N] R_data;
parameters {
  vector[N] R_multiplier;
transformed parameters {
  vector<lower=0>[N] R;
  R = R_multiplier .* R_data;
model {
  R_multiplier ~ normal(1, 0.25);

m = stan(model_code = modelstring2, data = list(N=N, R_data=R_data), seed = 1234)

And this is the error I’m getting.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: R[2] is -0.41672, but must be greater than or equal to 0  (in 'model8e4e69b72ec7_3d89a0a08343181c3029026d11b0b9ba' at line 10)

Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

which is unexpected, since if I remove the lower bound (change vector<lower=0>[N] R; to vector[N] R; in the transformed parameters section), I get the values for R that almost never gets lower than 0 (well, maybe some initial values can get down to -2, but not in 100 attempts in a row).

Could you please explain me why is it happening?

Thank you in advance,

I guess the problem is that at initialisation all the R_multiplier elements are initalised between -2 and 2. Since there are 25 elements, you have a probability of 1- .5^25 that there is at least 1 negative one which throws the error (because all elements in R_data are positive). If you know that R_data is always positive you could enforce that in the data block and then to the same in the parameter block for R_multiplier, this will ensure that R is positive.

1 Like

I think the main issue is that you have the lower-bound constraint on the wrong parameter. By having the bound on R, the model is free to sample negative values of R_multiplier which will then subsequently be rejected in the assignment to R and cause the model to stumble.

If you instead put the lower bound on R_multiplier, the model will only sample non-negative values and values of R will not fall below 0. This will give the same result (only positive values of R), but in a more efficient way (as it does not risk sampling a negative value, getting rejected, and sampling again).

1 Like

Thank you,
Well, I can’t put a lower bound on R_multiplier since it’s gonna limit some of possible R values.
But I’ve got the point, I severely underestimated the probability of getting at least one negative in a vector of inits. I should rather specify a vector of customized init values.

Well, I can’t put a lower bound on R_multiplier since it’s gonna limit some of possible R values

Not really, since any negative R_multiplier values are just going to be rejected and then re-sampled until a positive value is returned. Unless you’re planning on passing negative values in R_data?


You’re right, thank you!