Getting overflow (nans) in genearted quantities

maydin · January 23, 2021, 3:11am

Using cmdstan 2.24.1. The model runs fine (no errors) and the parameter distributions come back as expected. However, for everything declared in the generated quantities block, I get nans, which I’m interpreting to mean overflow.

data {
    int<lower=1> n; //number of elements in glc data
    int<lower=1> nhol; //number of elements in hol data
    int<lower=1> nwet; //number of elements in wet data
    int<lower=1> k; //start index 
    vector[n] x_glc; // x for glc data
    vector[n] xerr_glc; // err
    vector[nhol] x_hol; // x for hol data
    vector[nhol] xerr_hol; // err
    vector[nwet] x_wet; // x for wet data
    vector[nwet] xerr_wet; // err
    vector[n] y_glc;    // y for glc data 
    vector[n] yerr_glc; // err
    vector[nhol] y_hol;    // y for hol data 
    vector[nhol] yerr_hol; // err
    vector[nwet] y_wet;    // y for wet data 
    vector[nwet] yerr_wet; // err
}
parameters {
    real<lower=1e-5> alpha;              // slope
    real<lower=1e-5> beta;           // intcpt
    vector<lower=1e-15>[n] xtrue;    //true ion
}
model {
    xtrue ~ normal(x_glc, xerr_glc);
    y_glc[k:n] ~ normal(beta + alpha*xtrue[k:n], yerr_glc[k:n]); // possible to use subset of full data
}
generated quantities {
    vector[n] xtrue_glc;
    vector[nhol] xtrue_hol;
    vector[nwet] xtrue_wet;
    vector[n] ytrue_glc;
    vector[nhol] ytrue_hol;
    vector[nwet] ytrue_wet;
    vector[n] pcorr_glc; 
    vector[nhol] pcorr_hol; 
    vector[nwet] pcorr_wet; 
    vector[n] ypred_glc; 
    vector[nhol] ypred_hol; 
    vector[nwet] ypred_wet; 
    
    for(i in 1:n) {
        xtrue_glc[i] = normal_rng(x_glc[i], xerr_glc[i]);
        ytrue_glc[i] = normal_rng(y_glc[i], yerr_glc[i]);
        pcorr_glc[i] = a * xtrue_glc[i];
        ypred_glc[i] = ytrue_glc[i] - pcorr_glc[i];
    }  
    for(i in 1:nhol) {
        xtrue_hol[i] = normal_rng(x_hol[i], xerr_hol[i]);
        ytrue_hol[i] = normal_rng(y_hol[i], yerr_hol[i]);  
        pcorr_hol[i] = a * xtrue_hol[i];
        ypred_hol[i] = ytrue_hol[i] - pcorr_hol[i];
    }
    for(i in 1:nwet) {
        xtrue_wet[i] = normal_rng(x_wet[i], xerr_wet[i]);
        ytrue_wet[i] = normal_rng(y_wet[i], yerr_wet[i]);
        pcorr_wet[i] = a * xtrue_wet[i];
        ypred_wet[i] = ytrue_wet[i] - pcorr_wet[i];
    }
}

maydin · January 25, 2021, 6:17am

Update. There is a mistake in the code above. I have “a” in generated quantities instead of “alpha” as in the model block, but this error occurred while I was copying code here, so it was not the cause of failure. I tried a very similar, stripped down version of this code (see below) and it works fine, meaning I get back the generated quantities. As soon as I add the additional for loops with the quantities described in them, all generated quantities return NaNs despite no errors generated and the model block working fine. This is merely a nuisance in the end as I can take care of the stuff in the longer version of the generated quantities block (seen above) outside of Stan. However, I’m really curious to know what is going on.

data {
    int<lower=1> n; 
    int<lower=1> k; 
	vector[n] x_glc; 
    vector[n] xerr_glc;
    vector[n] y_glc; 
    vector[n] yerr_glc; 
}
parameters {
    real<lower=1e-5> alpha;               // slope
    real<lower=1e-5> beta;                 //intercept
    vector<lower=1e-15>[n] xtrue;      //true ion
}
model {
    xtrue ~ normal(x_glc, xerr_glc);
    y_glc[k:n] ~ normal(beta + alpha * xtrue[k:n], yerr_glc[k:n]);  
}
generated quantities {
    vector[n] pcorr_glc; 
    vector[n] pcorr_glc2; 
    vector[n] ypred_glc; 
    vector[n] ypred_glc2; 
    vector[n] xtrue_glc;
    vector[n] ytrue_glc;
    for(i in 1:n) {
        xtrue_glc[i] = normal_rng(x_glc[i], xerr_glc[i]);
        ytrue_glc[i] = normal_rng(y_glc[i], yerr_glc[i]);
        pcorr_glc[i] = alpha * xtrue_glc[i];
        pcorr_glc2[i] = alpha * xtrue[i];
        ypred_glc[i] = ytrue_glc[i] - pcorr_glc[i];
        ypred_glc2[i] = ytrue_glc[i] - pcorr_glc2[i];
    }
}

mike-lawrence · January 27, 2021, 11:28am

In the first model, are all the generated quantities NaN? (Is it possible that the ones with the _glc suffix are non NaN?)

Nothing jumps out to me as causing this; you’re sure that all the values in the *err_* variables you’re passing as data are positive?

mitzimorris · January 27, 2021, 7:35pm

try debug by print - 7.11 Print statements | Stan Reference Manual
what are the values for x_glc[i] and xerr_glc[i]. ?

maydin · January 28, 2021, 2:21am

Thanks for the response. I have since figured out why the entire generated quantities block was returning NaNs. The issue was a couple of very small negative numbers that were passed in as part of x_hol. It did not occur to me that negative numbers (in a practical sense they were just zeros) would be a problem. Once I dealt with those two data points (by equating them to very small positive numbers), everything worked. What I do not understand is why this caused all generated quantities to bomb instead of just the predictions in the “for i in 1:nhol” loop.

mitzimorris · January 28, 2021, 3:54pm

if generated quantities encounters an error, processing stops for that block.
the variables declared in the block are output and those variables have been initialized to NaNs.
no error is raised, and the sampler continues to the next iteration.

this is because the computation done in generated quantities doesn’t change the parameter estimates for the draw; the sampler produced a valid draw of the parameter values from the posterior, therefore you have a valid sample. if some computation in the generated quantities block is invalid due to bad data, well, c’est la vie.

maydin · January 28, 2021, 6:50pm

Got it. I wasn’t aware that the computation stopped for the entire block in case of bad data.

In any case, the data in question was not necessarily bad, meaning the 2 data points (out of a few hundred) were essentially zeros within uncertainties - slightly negative real numbers as I mentioned before. This is something that happens quite often with measurements actually. What is the most elegant way to deal with these situations? How to sample in the generated quantities block if some of the data that define your zero intercept (in a linear regression model) are negative?

maydin · January 28, 2021, 6:51pm

Even worse, one can have all negative numbers.

Topic		Replies	Views
Undefined values in generated quantities block General	14	1012	January 28, 2021
Generated Quantities Returning NaN Modeling fitting-issues	2	984	September 29, 2021
Problem with generated quantities in cmdstanpy General fitting-issues , cmdstanpy , cognitive-science	6	608	October 25, 2024
How to find the details of why sampling is not working in some special cases Modeling	2	498	May 14, 2021
Generated Quantities returns error "Mismatch between model and fitted_parameters csv file" CmdStan cmdstanr	13	1819	September 6, 2020

Getting overflow (nans) in genearted quantities

Related topics