Using Pathfinder or other method to set initial values for sampling

Do you have a smallish model that shows this behavior? I can look on Monday but should easier to debug with a test that shows the bad behavior

Using @mhollanders’s (big) model is the first one with which I see the problem. I don’t rememeber seeing non-finite gradients except far in tails of posterior, but in this case the non-finite gradients are happening towards the higher density. @mhollanders do you have a smaller model + data that breaks Pathfinder?

Hey Aki, I just had a go at simulation just a basic regression with several nested random effects but it works well, so I’m afraid I don’t have an example. All I know is that in my model, it happens with all configurations (SEM vs. MVN) once the site-level random effects are added.

It also seems that Pathfinder failing depends on the data, so that sometimes when I restart R, and rerun data generation, I do not get non-finite gradients.

This example did anyway help to fix two things in the initialization function to make it more robust (one of these is fic for NaN’s and Infs in lp__). I would guess the fix would be included in the cmdstanr point release on next Monday

1 Like

This doesn’t work for me and instead returns an error:

Error in, size, replace, prob) : 
  NA in probability vector

Yeah, this is probably due to the same issue @avehtari is looking into.

@mhollanders and @JLC , can you try with the github version


If you do this in a running session, you need to unload and load cmdstanr and call cmdstan_model() again (or just restart R and run your script again). This fix should remove couple errors, but if all your draws have non-finite gradient then the initialization still doesn’t work,

@JLC can you tell more about your model and data, and even better if you can share them so that I can test myself.

Hey Aki, just confirming that I’m still getting lp__ = Inf, but I can use the Pathfinder object as initial values for HMC.

1 Like

Thanks @avehtari

The github version gives me a different problem, which might be due to me being new to Pathfinder. When running Pathfinder I do see a lot of:

Error evaluating model log probability: Non-finite gradient. 

When trying to use the pathfinder result for init, I’m get:

Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=peak_z; dims declared=(1,175); dims found=(175)

My data list declares an array [175] for peak_z.

If I don’t use the pathfinder object for the init, everything samples fine. I’m not sure where the 1,175 declaration is originating.

My model is a piecewise regression of viral trajectory:

functions {              
        // mufun returns mean Ct:
        real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
          // Proliferation 
         if(t <= peaktime)
              return(peak + pslope * t);
            // Clearance
              return(peak + (pslope * peaktime + cslope * (t - peaktime)));
        // durfun returns stage (prolieration and clearance) durations:
        real durfun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
          // Proliferation 
         if(t <= peaktime)
              return(abs((lod - peak)/pslope));
            // Clearance
              return(abs((lod -  peak - pslope * peaktime + cslope * peaktime)/cslope));
         // stagefun returns stage (prolieration and clearance) indicators:
         real stagefun(real t, real peaktime){
         if(t <= peaktime)
              return(1); // Proliferation 
              return(2); // Clearance

data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // observed Ct
  real<lower=0> lod;    // limit of detection
  array[N] real lb; // upper truncation bounds
  array[N] real ub; // upper truncation bounds
  int<lower=1> K_pslope; // number of proliferation slope predictors
  matrix[N, K_pslope] X_pslope; // proliferation slope design matrix
  int<lower=1> K_cslope; // number of clearance slope predictors
  matrix[N, K_cslope] X_cslope; // clearance slope design matrix
  int<lower=1> K_peak; // number of peak predictors
  matrix[N, K_peak] X_peak; // peak design matrix
  int<lower=1> K_peaktime; // number of peak time predictors
  matrix[N, K_peaktime] X_peaktime; // peak time design matrix
  array[N] int t; // time 
  int<lower=1> nSubj; // number of subjects
  array[N] int<lower=1> Subj; // subject indicator

parameters {
  vector[K_pslope] b_pslope; // proliferation slope
  vector[K_cslope] b_cslope; // clearance slope
  vector[K_peak] b_peak; // peak Ct
  vector[K_peaktime] b_peaktime; // time of peak Ct
  real<lower=0> sigma; // variance
  vector<lower=0>[1] peak_sd; // subject-level peak Ct varfiance
  array[1] vector[nSubj] peak_z; // individual peak Ct effects
  vector<lower=0>[1] peaktime_sd; // subject-level peak Ct time variance
  array[1] vector[nSubj] peaktime_z; // individual peak Ct time effects

transformed parameters {
    vector[nSubj] subj_peak; // subject-specific peak Ct offsets
    vector[nSubj] subj_peaktime; // subject-specific peak Ct time offsets
    subj_peak = peak_sd[1] * peak_z[1];
    subj_peaktime = peaktime_sd[1] * peaktime_z[1];
    vector[N] pslope = X_pslope * b_pslope;
    vector[N] cslope = X_cslope * b_cslope;
    vector[N] peak = X_peak * b_peak + subj_peak[Subj];
    vector[N] peaktime = X_peaktime * b_peaktime + subj_peaktime[Subj];
    vector[N] mu;
     // Calculate location (mu) 
     for (i in 1 : N) {

     mu[i] = mufun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);


model {
  real lprior = 0; 
  // likelihood
   target += normal_lupdf(Y | mu, sigma)
                - normal_lcdf(ub | mu, sigma);

  // priors 
  lprior += normal_lupdf(b_pslope | 0, 1);
  lprior += normal_lupdf(b_cslope | 0, 1);
  lprior += normal_lupdf(b_peak[1] | 18, 10);
  lprior += normal_lupdf(b_peak[2] | 18, 10);
  lprior += normal_lupdf(b_peak[3] | 18, 10);
  lprior += normal_lupdf(b_peak[4:] | 0, 5);
  lprior += normal_lupdf(b_peaktime | 0, 0.25);
  lprior += normal_lupdf(sigma | 0, 4);
  lprior += normal_lupdf(peak_sd | 0, 10);
  lprior += normal_lupdf(peaktime_sd | 0, 3);
  target += lprior;
  target += std_normal_lupdf(peak_z[1]);
  target += std_normal_lupdf(peaktime_z[1]);

generated quantities {
    vector[N] log_lik;
    vector[N] ct_pred;
    vector[N] Duration;
    vector[N] Stage;
     for (i in 1 : N) {
                //Log likelihood
                log_lik[i] = normal_lpdf(Y[i] | mu[i], sigma)
                - normal_lcdf(ub[i] | mu[i], sigma);
                // Predicted Ct's
              //  ct_pred[i] = normal_lub_rng(mu[i], sigma, 0.0, 37.0);
                // Calculate durations by row
                Duration[i] = durfun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);
                // Create stage indicator per row       
                Stage[i] = stagefun(t[i],peaktime[i]);

ping @andrjohns for another dimension mismatch with

array[1] vector[nSubj] peak_z;

@JLC thanks for the additional information and the code, this is helpful

Test with the newest development version. I had the same error for a model generated with brms, but it works now. This is the fix: Fixes 975 by only removing leftmost array dimension if equal to 1 by SteveBronder · Pull Request #993 · stan-dev/cmdstanr · GitHub

1 Like