Exception: categorical_logit_lpmf: log odds parameter[2] is nan, but must be finite!

My model returning the message:
‘Exception: categorical_logit_lpmf: log odds parameter[2] is nan, but must be finite! (in ‘test0825 copy 3.stan’, line 93, column 16 to column 66)’
here is the block that involves this error message.

            if (conte_it==0){
                vector[4] util = rep_vector(0,4);
                real rho = inv_logit(mit);
                real ur;
                ur =mean(Initiated[(ob-min(itd,14)+1):ob]);
                vector[3] nuse;
                for(j in 1:3){
                    nuse[j] = ur*(1-rho^condur[j])/(1-rho);
                    util[j+1] = -exp(-r*nuse[j]*mit+(r*nuse[j]^2/2*vit)) + intcept[j] + (Constatus[ob]==j+1)*restpar[5];
                }

                target += categorical_logit_lpmf(Choice[ob]|util);
            }

The model was able to sample, but the some of the parameter estimates do not converge. Here is the traceplot. How to deal with the ‘NA’ issue? Is the nonconvergence due to this problem?

Can you share the entire code?

Yes, sure. Here is the code.

data{
    int ncust;
    int nob;
    int nupdt;

    array[nob] int Initiate;
    array[nob] int Complete;
    array[nob] int Choice;
    array[nob] int Itd;
    array[nob] int Conidx;
    array[nob] int Dur;
    array[nob] int Contd;
    array[nob] int Conte;
    array[nob] int Constatus;
    array[nob] int Updt;
    array[nob] int updt_idx;
    array[nob] int updt_idx_ini;

    array[nupdt] real <lower=1e-500> Vit;
    array[nupdt] int Itd_updt;
    array[nupdt] int K;
}


parameters {
   real M;
   real<lower=0.01> V;
   real<lower=0.001> r;
   array[3] real restpar;
   array[nupdt] real Ms;
   array[3] real intcept;
}

transformed parameters {
   array[nupdt] real Ms_bar;
   array[nupdt] real<lower= 0> Vs_bar;
   array[nupdt] real <lower= 0> Vs;
   array[nupdt] real <lower= 0> sqrt_Vbar;
   
   for (up in 1:nupdt){
    real m0 = (Itd_updt[up]==1)? M:Ms[up-1];
    real v0 = (Itd_updt[up]==1)? 10:Vs[up-1];
    
    Vs[up] = 1/(1/v0 + K[up]/V);
    Vs_bar[up] = K[up]*Vs[up]^2/V;
    sqrt_Vbar[up] = sqrt(Vs_bar[up]);
    Ms_bar[up] = Vs[up]/v0*m0 + K[up]*Vs[up]/V*M;
   }
}

model{
    M ~ normal(2,1);
    V ~ lognormal(log(12),0.5);
    r ~ lognormal(log(2),0.5);
    //real r = 2;
    Ms ~ normal(Ms_bar,sqrt_Vbar);
    //row_vector[3] intcept=[-1,-1,-2];
    intcept ~ normal(-1,1);
    restpar ~ normal(1,1);

    row_vector[3] condur = [1,3,12];

    for (ob in 1:nob){
        int itd = Itd[ob];
        int conte_it = Conte[ob];
        int contd_it = Contd[ob];
        int conidx_it = Conidx[ob];
        int updt_idx_it = updt_idx[ob];
        int dur_it = Dur[ob];


        if (itd>1){
            target += bernoulli_logit_lpmf(Initiate[ob]|Ms[updt_idx_ini[ob]]*(conidx_it==0)+(contd_it*1.0/dur_it<=0.3)*restpar[1]+(conte_it*1.0/dur_it<=0.3)*restpar[2]);

            if (Initiate[ob]==1){
                target += bernoulli_logit_lpmf(Complete[ob]|M);
            }

            if(conte_it==0){
                vector[4] util = rep_vector(0,4);
                real mit = Ms[updt_idx_it];
                real rho = inv_logit(mit);
                real ur;
                ur = mean(Complete[(ob-min(itd,14)+1):ob]);
                vector[3] nuse;
                for(j in 1:3){
                    nuse[j] = ur*(1-rho^condur[j])/(1-rho);
                    util[j+1] = nuse[j]*mit - r*nuse[j]^2*Vs[updt_idx_it] + intcept[j] + (Constatus[ob]==j+1)*restpar[3];
                }
                real maxutil = max(util);
                target += categorical_logit_lpmf(Choice[ob]|util-maxutil);
            }

        }

    }

}

actually, I fixed some bug in the code. and it seems to converge now although the error message still pops up.
Exception: categorical_logit_lpmf: log odds parameter[2] is nan, but must be finite! (in ‘test0902.stan’, line 105, column 16 to column 74)
Exception: normal_lpdf: Scale parameter[1] is 0, but must be positive! (in ‘test0902.stan’, line 70, column 4 to column 34)
Exception: normal_lpdf: Scale parameter[1] is 0, but must be positive! (in ‘test0902.stan’, line 70, column 4 to column 34)

Here is the traceplot. For the parameter ‘V’ (on the bottom), however, it still seems to have some autocorrelation.


the summary of the estimates is as follows:

I am new to both Bayesian Estimation and Stan. Do you think the results have good mixing? I am running with 500 warmups, and 900 iterations just for trial run. the estimate of M is close enough but the true V is 12 which still has some distance from the estimates 13. I am increasing the number of warmups and iterations and running the model again. Hopefully, it could get closer to the true value of V.

Your MSCE for V is pretty high, but given the StdDev, I would not expect a V of 12 to be recovered with more iterations. But that could be because of the data set itself, not the model. There is some clear autocorrelation in V that isn’t present in M. If you want to really check for convergence sensitively, you should use rank plots. A code like this in MATLAB:

for c = 1:length(C)
 chains((c-1)*size(C{c}(:,p),1)+1:c*size(C{c}(:,p),1)) = C{c}(:,p);
end

chains = tiedrank(chains);
chains = reshape(chains,length(C),[]);

set(figure(1),'OuterPosition',[400 100 1000 1000])

for c = 1:length(C)
 subplot(2,2,c)
 histogram(chains(c,:),'BinEdges',linspace(0,4*length(chains(c,:)),11))
 hold on
 yline(length(chains(c,:))/10,'LineStyle','--','LineWidth',1.5)
end

You can read more about rank plots here:
https://bayesiancomputationbook.com/markdown/chp_02.html#exploratory-analysis-of-bayesian-models

Rank plots are a sensitive test of both convergence and mixing.

Additionally, you could generate ten data sets from the same generative parameters and see if the recovered parameter estimates fall around the true parameter set.

I would do a print statement on util, maxutil, and util-maxutil to see where this nan is coming from.

thank you very much! let me look into it.

Was this for the default 1000 warmup/1000 sampling iterations in 4 chains (i.e., 4000 posterior draws). If so, it’s fine—it’s an integrated autocorrelation time of roughly 7, which is generally considered good mixing.

The way to evaluate if it’s mixing through the right region of the posterior is to simulate data then fit it and check whether the posterior has the right coverage (e.g., 50% intervals contain about 50% of the true values).

This is a NaN issue, which is not-a-number, a fixed floating point value for errors that is part of the IEEE standard. NA is what R uses for undefined values. NaN values come up when arithmetic is incoherent, like 0 / 0, or inf - inf. I’m guessing that what’s happening is that you are rounding rho to 1, so that nuse[j] becomes NaN.

You can instrument your code with print statements to see where things are going wrong. For instance, print out the values of rho and ur and nuse[j] and util[j + 1] and you can see where util gets a NaN introduced.

Are you sure you want -exp(...) before applying categorical_logit? The categorical_logit distribution applies softmax (exp(x) / sum(exp(x))) to the argument, so it’s getting doubly exponentiated. This is a quick route to overflow or underflow.

The thing to do to mitigate this problems is to work on the log scale for as long as you can. This may, unfortunately, require unfolding categorical_logit and coding it up manually, because that can be part of the problem.

For example, we know that your target += statement will amount to this.

target += log(softmax(util))[Choice[ob]];

Now you can start unfolding the algebra back through the definition of util.

But given that things seem to be mixing OK, I wouldn’t bother with this. I would try to do the validation with simulated data, though.

this is very unlikely given that the existing fit has an MCMC standard error (MCSE) of only 0.02. You wouldn’t expect to find results further away than 4 times the MCSE.

You have to be careful about what you mean by the “true” value of V. We’re trying to find the posterior mean of V, not recover the simulated value. Here’s an example. If I simulate a bunch of data i.i.d. via V ~ normal(0, 1), it might have a mean of 0.3 and a standard deviation of 0.9, so that you don’t expect to recover mu = 0 and sigma = 1 by fitting V ~ normal(mu, sigma). In the limit of infinite data, you will you get convergence to the true values with a well-behaved model.

Hi Bob,
thank you very much for your such detailed explainations. To avoid NaN that might be produced due to rho → 1, I set rho = (rho,1.0-1e-6), and I unfolded the loglikelihood as you suggested using log_sum_exp. Now the error terms disapeared.

The full code is as follows.


data{
    int ncust;
    int nob;
    int nupdt;

    array[nob] int Initiate;
    array[nob] int Complete;
    array[nob] int Choice;
    array[nob] int Itd;
    array[nob] int Conidx;
    array[nob] int Dur;
    array[nob] int Contd;
    array[nob] int Conte;
    array[nob] int Constatus;
    array[nob] int Updt;
    array[nob] int updt_idx;
    array[nob] int updt_idx_ini;

    array[nupdt] real <lower=1e-500> Vit;
    array[nupdt] int Itd_updt;
    array[nupdt] int K;
}

parameters {
   real M;
   real<lower=0.01> V;
   array[3] real restpar;
   array[nupdt] real Ms;
   array[3] real intcept;
}

transformed parameters {
   array[nupdt] real Ms_bar;
   array[nupdt] real<lower= 1e-10> Vs_bar;
   array[nupdt] real <lower= 1e-10> Vs;
   array[nupdt] real <lower= 1e-10> sqrt_Vbar;
   
   for (up in 1:nupdt){
    real m0 = (Itd_updt[up]==1)? M:Ms[up-1];
    real v0 = (Itd_updt[up]==1)? 10:Vs[up-1];
    
    Vs[up] = 1/(1/v0 + K[up]/V);
    Vs_bar[up] = K[up]*Vs[up]^2/V;
    sqrt_Vbar[up] = sqrt(Vs_bar[up]);
    Ms_bar[up] = Vs[up]/v0*m0 + K[up]*Vs[up]/V*M;
   }
}

model{
    M ~ normal(1,1);
    V ~ lognormal(log(4),0.5);
    //r ~ lognormal(log(1),0.5);
    real r = 1.5;
    Ms ~ normal(Ms_bar,sqrt_Vbar);
    //row_vector[3] intcept=[-1,-1,-2];
    intcept ~ normal(2,1);
    restpar ~ normal(1,1);

    row_vector[3] condur = [1,3,12];

    for (ob in 1:nob){
        int itd = Itd[ob];
        int conte_it = Conte[ob];
        int contd_it = Contd[ob];
        int conidx_it = Conidx[ob];
        int updt_idx_it = updt_idx[ob];
        int dur_it = Dur[ob];


        if (itd>1){
            target += bernoulli_logit_lpmf(Initiate[ob]|Ms[updt_idx_ini[ob]]*(conidx_it==0)+(contd_it*1.0/dur_it<=0.3)*restpar[1]+(conte_it*1.0/dur_it<=0.3)*restpar[2]);

            if (Initiate[ob]==1){
                target += bernoulli_logit_lpmf(Complete[ob]|M);
            }

            if(conte_it==0){
                vector[4] util = rep_vector(0,4);
                real mit = Ms[updt_idx_it];
                real rho = inv_logit(mit);
                rho = min(rho,1.0-1e-6);
                real ur;
                ur = mean(Complete[(ob-min(itd,7)+1):ob]);
                vector[3] nuse;
                for(j in 1:3){
                    nuse[j] = ur*(1-rho^condur[j])/(1-rho);
                    //util[j+1] = nuse[j]*mit - r*nuse[j]^2*Vs[updt_idx_it] + intcept[j] + (Constatus[ob]==j+1)*restpar[3];
                    util[j+1] = -exp(-r*nuse[j]*mit +(r*nuse[j])^2/2*Vs[updt_idx_it]) + intcept[j] + (Constatus[ob]==j+1)*restpar[3];
                }
                //real maxutil = max(util);
                //target += categorical_logit_lpmf(Choice[ob]|util-maxutil);
                real maxutil = max(util);
                vector[4] log_softmax_util;
                for (k in 1:4){
                    log_softmax_util[k] = util[k] - maxutil-log_sum_exp(util-maxutil);
                }

                target += log_softmax_util[Choice[ob]];
            }

        }

    }

}

However, the model still produces the following error terms:
error 1.
Exception: normal_lpdf: Scale parameter[1] is 0, but must be positive! (in ‘/tmp/tmpr__a_zjj/tmpvebvcty9.stan’, line 74, column 4 to column 34)
error 2.
18:46:35 - cmdstanpy - WARNING - Some chains may have failed to converge.
Chain 3 had 2 iterations at max treedepth (0.2%)
Chain 4 had 15 iterations at max treedepth (1.9%)
Use the “diagnose()” method on the CmdStanMCMC object to see further information.

the first error refers to the line
Ms ~ normal(Ms_bar,sqrt_Vbar);
I have already constrained sqrt_Vbar with the lower bound 1e-10, why the scale parameter would still be 0?

for the error 2, would this message be a concern?

The traceplot is as follows. Do you think the traceplot for V looks acceptable? I am still trying to get a sense of what looks acceptable and what does not. Thank you very much in advance for your help.