Hi all, I’m receiving the following error in my Stan model:
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: gamma_lpdf: Random variable is nan, but must not be nan! (in 'model6fb672628_stan_6fb725c2880' at line 61)
I know this is a common error and I’ve read a number of other responses, but I’m new to Stan and I can’t figure out in my case what’s going wrong with the gamma distribution. Here’s the model:
Thanks for checking for other responses before asking your question, but yeah it’s often hard to figure out if (and how) a response related to someone else’s model can be applied to fix the problem in yours. (Edit: And especially in this case since initialization problems can happen for multiple different reasons.)
I think the problem here (or at least part of it) is that declaring variables in the model block like this
works differently than if you declare them in the parameters block. In the model block variable declarations like this just create new variables initialized to nan that are waiting to be given values. In the parameters block, on the other hand, you’re telling Stan to estimate those variables so you don’t have to supply values yourself. So what’s happening in the model block is that when you then go use the variables you declared
you haven’t assigned them values yet and so you get the error Random variable is nan, but must not be nan!. Looking a bit further down in your model block
it looks like you do eventually give some of these variables values. Is the idea that you wanted Stan to estimate the first element of these vectors and then assign values to the other elements?
Thanks so much! I’d like to just initialize the first value of those four parameters as a random draw from those distributions I used. It’s true that for lambda and the two gamma parameters I’ll manually assign values for all the elements after 1, but that isn’t the case for N, which I want Stan to estimate all of.
Is there a Stan equivalent of N[1] = rgamma(…) to replace the N[1] ~ gamma(…) code? I can’t feed these in as initial values during model fitting because (I think) that would require creating N, lambda, etc. earlier than the model block (i.e. in the parameter block), which produces a different error.
Here are a couple of possibilities that have different implications:
declare the first element in the transformed data block and use gamma_rng (this is like R’s rgamma) . If you use it in transformed data then this is done once per Markov chain but is constant over iterations within the chain (it happens before sampling).
declare the first element in the parameters block and give it that gamma prior in the model block. This is different than the first option because you’re not actually assigning it to a fixed single draw from that (prior) distribution, you’re estimating a posterior distribution for it. Even if you don’t care about its posterior it still serves to propagate uncertainty into the rest of the model.
To estimate all of it you can just declare N as a vector in the parameters block (instead of at the top of the model block) and then you can keep the rest of what you have for N in the model block.
A few other comments about the model:
You have several uniform priors with different lower and upper bounds but in the parameters block
the parameters don’t have the same lower and upper bounds. That will also cause problems for initialization. If you give parameters lower and upper bounds then the default prior is uniform with those bounds, so you don’t even need to use ~ uniform(...). But if you do then you still need the parameter constraints so Stan can initialize from a valid value.
Unless I missed something it looks like lambda[1] is never needed for anything and the rest of lambda is only needed to compute gamma_shape/rate. So you could really just compute lambda as a local variable inside the loop and overwrite it each time instead of filling in a whole vector:
for(t in 2:len_t){
real lambda_t = N[t-1] * exp(r - r * N[t-1] / K);
//... use lambda_t and then it will get overwritten next time through the loop
}
Hopefully that’s helpful, but I was only able to give it a quick look right now so there may also be other issues you run into with the model. Definitely feel free to keep posting with more questions if you get stuck.
Thanks very much. I had previously tried declaring these elements (N, lambda, gamma_shape, gamma_rate) in the parameters block but got a different error that I had resolved by moving them to the model block:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Cannot assign to variable outside of declaration block; left-hand-side variable origin=parameter
Illegal statement beginning with non-void expression parsed as
lambda[t]
Not a legal assignment, sampling, or function statement. Note that
* Assignment statements only allow variables (with optional indexes) on the left;
* Sampling statements allow arbitrary value-denoting expressions on the left.
* Functions used as statements must be declared to have void returns
error in 'model6fb7bb353fa_stan_6fb4b80e6d7' at line 68, column 0
-------------------------------------------------
66:
67: for(t in 2:len_t){
68: lambda[t] = N[t-1] * exp(r - r * N[t-1] / K);
^
69: gamma_shape[t] = (lambda[t]^2) / (sigma_process^2);
I’m sort of puzzled why time-varying latent variables are so hard to code in Stan (this model already runs without error in jags in 25 lines of code!)
Regarding lambda, it’s true that I don’t need it for anything right now, but I probably will want to track its value in each time step and am also building some more complexity into the model later where its role may change.
That error is because you cannot assign (use =) values to things declared in the parameters block. JAGS/BUGS would figure out for you that lambda[1] was a stochastic node and lambda[2 : len_t] were deterministic, but in Stan the onus is on you. A first pass at re-expressing this model is:
data {
int<lower=1> len_t;
vector[len_t] y;
real<lower=0> z0;
real<lower=0> sd0;
}
parameters {
real r_hyper_mu;
real<lower=0> r_hyper_sigma;
real<lower=0> K_hyper_lower;
real<lower=K_hyper_lower> K_hyper_upper;
real<lower=0> sigma_obs_hyper_mean;
real<lower=0> sigma_obs_hyper_sd;
real<lower=0> sigma_process_hyper_lower;
real<lower=sigma_process_hyper_lower> sigma_process_hyper_upper;
real r;
real<lower=K_hyper_lower, upper=K_hyper_upper> K;
real<lower=0> sigma_obs;
real<lower=sigma_process_hyper_lower, upper=sigma_process_hyper_upper> sigma_process;
vector<lower=0>[len_t] N;
real<lower=0> lambda_1;
real<lower=1e3, upper=1e7> gamma_shape_1;
real<lower=1, upper=1e3> gamma_rate_1;
}
transformed parameters {
vector<lower=0>[len_t] lambda;
vector[len_t] gamma_shape;
vector[len_t] gamma_rate;
lambda[1] = lambda_1;
gamma_shape[1] = gamma_shape_1;
gamma_rate[1] = gamma_rate_1;
for(t in 2:len_t){
lambda[t] = N[t-1] * exp(r - r * N[t-1] / K);
gamma_shape[t] = (lambda[t]^2) / (sigma_process^2);
gamma_rate[t] = lambda[t] / (sigma_process^2);
}
}
model {
// uniforms not strictly necessary if the bounds are appropriately declared in
// the parameters block
K ~ uniform(K_hyper_lower, K_hyper_upper);
sigma_process ~ uniform(sigma_process_hyper_lower, sigma_process_hyper_upper);
gamma_shape_1 ~ uniform(1e3, 1e7);
gamma_rate_1 ~ uniform(1, 1000);
K_hyper_upper ~ normal(0, 1000);
sigma_process_hyper_upper ~ normal(0, 1000);
r ~ normal(r_hyper_mu, r_hyper_sigma);
sigma_obs ~ normal(sigma_obs_hyper_mean, sigma_obs_hyper_sd);
N[1] ~ gamma((z0^2/sd0^2),(z0/sd0^2));
lambda_1 ~ gamma((z0^2/sd0^2),(z0/sd0^2));
// need to index the RHS so that it and the LHS have the same length
N[2:len_t] ~ gamma(gamma_shape[2 : len_t], gamma_rate[2 : len_t]);
y ~ normal(N, sigma_obs);
}
which I can get to sample with synthetic data, though I think you need some more informative priors on many of the hyper parameters.
Also, the _upper variables must have the _lower variables as their lower bounds – The positive_ordered type is useful here.
Thank you! That’s really helpful. Out of curiosity, why did you estimate the upper parameters setting bounds for uniform distributions but not the lower bounds (like K_hyper_lower)?
Because the lower bounds are bounded between zero and the upper bound, which makes it difficult for them to explode to +\infty. The upper bound, however, is bounded below by the lower bound, but is not bounded above and often shoots off to implausible values without a prior.