Debugging 'Log probability evaluates to log(0)' - yet print() shows lprobs are fine

I have written my own _lpdf function, defined it in the functions block, and it is at least syntactically correct. The function is rather complex, as I had to deal with ragged data, though I believe that to be irrelevant for my question.

Initializing with random values or 0, I get:
Log probability evaluates to log(0)

So I added
print(lprob)
to my code, where lprob is a real scalar which is returned by the function.

I copied the console output into a text editor, looking for ‘nan’, ‘na’, and also inspected the output just by scrolling through the numbers. I cannot find a single nan. All numbers look fine to me.

What else can I do to debug?

Full code, sorry for all the idiosyncrasies. The unit/household/person/i-specific log-likelihood contains PDF, CDF components + Jacobian from COV arithmetic.
Standard multivariate Normal heterogeneity on top of it.

functions{
  real vdm_lpdf(vector xx, vector xxp, vector pp, vector aai, 
                real mxp, vector minz,
                vector betai, real gamma, real E, real sigma,
                int ntski, int nalti, int nattr, int dimxi){
    
    //defs
    vector[dimxi] ab;   // where a*beta is saved
    vector[dimxi] g;    // store g vector
    vector[dimxi] j1;  // Jacobian part 1
    real j2;            // Jacobian part 2
    vector[dimxi] j21;
    vector[nalti] j22;
    real lprob; //variable to hold log prob
    int pos; // counters
    int pos2;
    int pos3;
    
    //compute a*beta 
    pos=1;
    pos2=1;
    for(jj in 1:dimxi){
        ab[pos2]=sum(betai .* segment(aai,pos,nattr));
        pos=pos+nattr;
        pos2=pos2+1;
    }
    
    //g vector
    g = -ab + log(gamma*xx+ 1) + log(pp) - log(minz+E-mxp) + log(pp);
    
    
    //Jacobian
    j1 = (log(gamma)-log( (gamma * xx) + 1)) .* (xxp);
    j21 = ( ((gamma*xx+1) .* pp) ./ (gamma*(minz+E-mxp))) .* (xxp);
 

    pos3=1;
    for(jj in 1:ntski){
      j22[jj] = sum(segment(j21,pos3,nalti));
      pos3=pos3+nalti;
    }
    j2=sum(log(j22+1));
    


    //ll
    lprob = -sum(exp(-g/sigma)) + sum((g/sigma-log(sigma)) .* xxp) + sum(j1) + j2;
           //print(lprob);

    return lprob;
  }
}


data{
  int<lower=1> N;         // no of households
  int<lower=1> K;         // no of hh level parameters
  int<lower=1> nattr;     // no of attributes
  int<lower=1> dimxst;    // total number of elements for x, p
  int<lower=1> dimast;    // total number of elements for a
  int<lower=1> dimxs[N];  // hh's number of elements for x
  int<lower=1> dimas[N];  // hh's number of elements for a
  int<lower=1> ntsks[N];  // number of choice tasks for household
  int<lower=1> nalts[N];  // number of alternatives available to household
  vector<lower=0>[N] mxps;         // max spent for hh
  vector<lower=0>[dimxst] XX;      // all choices X in one vector
  vector<lower=0>[dimxst] XXP;      // all choices X in one vector
  vector<lower=0>[dimxst] PP;      // all prices in one vector
  vector<lower=0>[dimxst] MINZ;    // minimum outside good z for each task/alternative for each hh
  vector[dimast] AAI;     // all attributes in one vector
}


parameters{
  matrix[nattr,N] beta;
  vector[N] lgammas;
  vector[N] lsigmas;
  vector<lower=1>[N] Eus;
  
  corr_matrix[K] Omega;
  vector<lower=0>[K] tau;
  vector[K] MU;
}

transformed parameters{
  vector[N] SIGMAS;
  vector[N] GAMMAS;
  cov_matrix[K] SIGMA;
  vector[N] Es;
  matrix[K,N] theta;
    vector[N] LE;

  LE = Eus .* log(mxps);
  Es    = exp(LE)+0.00001;

  theta = append_row(append_row(append_row(beta,lgammas'),LE'),lsigmas');
  SIGMA = quad_form_diag(Omega, tau);
  
  SIGMAS = exp(lsigmas);
  GAMMAS = exp(lgammas);
}



model{
//counters
int posm;
int posm2;
posm=1;
posm2=1;

//priors
tau   ~ cauchy(0, 2.5); 
Omega ~ lkj_corr(2);
MU    ~ normal(0,10);

for (j in 1:N){
theta[,j] ~ multi_normal(MU, SIGMA);
}

//hh level lls
for(i in 1:N){
  segment(XX,posm,dimxs[i]) ~ vdm(segment(XXP,posm,dimxs[i]),
                                  segment(PP,posm,dimxs[i]),
                                  segment(AAI,posm2,dimas[i]),
                                  mxps[i],
                                  segment(MINZ,posm,dimxs[i]),
                                  beta[,i],
                                  GAMMAS[i],
                                  Es[i],
                                  SIGMAS[i],
                                  ntsks[i],
                                  nalts[i],
                                  nattr,
                                  dimxs[i]);
  posm=posm+dimxs[i];
  posm2=posm2+dimas[i];
  print(target());

  }
}

Did you make sure and increment the log density with a target += your_distribution_lpdf(); or a y ~ your_distribution();?

(edited to change log probability to log density)

You might also want to add print(target()); at the very end – that’ll show what the target is (without the jacobians).

Are you not getting any other warning messages here? I think that may be a bug in one of our recent releases that drops the message when initialization fails.

I have no idea what lprob is here as you haven’t indicated your program, but what you want to do is at the end of the model block, include:

print(target());

That should print the accumulated log density at the end of the run. For example,

parameters {
  real<lower = 0, upper = 1> theta;
}
model {
  print("theta = ", theta);
  print("log Jacobian = ", target());
  theta ~ beta(10, 3);
  print("log density plus Jacobian = ", target());
}

will print the Jacobian (from the constraining transform to make theta lie in (0, 1)) and the the sum of the log density and the Jacobian.

I should have mentioned that it is a hierarchical model. I am using the ~distribution statement. Since I have to deal with ragged data, it is

for(i in 1:N){
segment(bigvector,pos,inobs[i])~mylikelihood(...)
pos=pos+inobs[i];
}

I use the usual Normal/ cauchy-ikj prior set up.

To clarify, I put print within the myfunction_lpdf function in the function block. I haven’t thought about using target, since I was using ~. However, it might be an interesting check to do. I have to read about using target, and I already found a few threads on that.

Bob, thanks.

print(target()) indeed yields 'nan’s. I would be strange if it didn’t. However, why does it do that? I print from within my lpdf.

I do get warning, but I don’t see how that would be related.

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    theta[ , j] ~ multi_normal(...)

I added my full model/parameter/function code in the original question. It’s pretty complex, but probably still makes it easier to follow what I am doing.

You want to start dropping print statements between all of your statements of the form:

if (is_nan(target())
  print("target() became NaN at position X");

Your model will be way more efficient if you vectorize the multivariate normal (by making theta an array of vectors). And even more efficient if you use a Cholesky-factor parameterization of the LKJ and multi-normal (example in the manual regression chapter).

Yeah, like Bob said, more prints. Maybe put a

print("target() became NaN in loop", i);

Into the for loop

vdm_lpdf appears to return nan for every subject 1:N
However, the print within the vdm_lpdf (right before the return line) suggests otherwise.

I realized that I had separate loops for theta ~ and segment(XX,…)~, so I merged the loop and now there are some units without NaN. Interesting.

So, turns out it was my 1st stage prior definition causing the trouble.

1 Like

Hmm, I wouldn’t expect that to change anything. Edit: But it’s interesting that it does.

I wonder if trying to get a reproducible version of this with the minimum number of parameters would be useful? Like does this happen if beta is the only parameter etc.?

Well, yes, I love creating minimal examples, but I wasn’t really sure where the culprit was. Now it doesn’t seem to be my _lpdf function, but instead my prior formulation. I should probably create a separate thread for this … I have a multivariate prior over some ‘betas’ (akin to regression coefficients), log(g), log(s) and log(E) where I do have different lower bounds for E for each unit. Let theta be the full nparaXnunits matrix of parameters consisting of betas,log(g),log(s),log(E). Then I can define theta~MVN and use transformed parameters to split up theta. However, if I define beta,log(g),… separately, then use transformed parameters to create theta by appending the matrix and columns, it doesn’t seem to work.

Now the challenge is to reparametrize E…
I can reparametrize as exp(e_para)+min_e
However, I’d like to have min_e be part of the distribution, since the covariance structure is more informative then

Cool! It’s fine to keep it in this thread (so everyone can follow the evolution of the problem). By default transformed parameters and any other temporary variable you defined in transformed parameters or the model block are initialized to NaN.

So this could be something not being initialized and sneaking through and infecting everything else. Might be tedious, but I’d suggest just testing each element of everything you define in the transformed parameters block to search for NaNs or infs.