Using Pathfinder or other method to set initial values for sampling

stevebronder · May 24, 2024, 2:14am

Do you have a smallish model that shows this behavior? I can look on Monday but should easier to debug with a test that shows the bad behavior

avehtari · May 24, 2024, 6:35am

Using @mhollanders’s (big) model is the first one with which I see the problem. I don’t rememeber seeing non-finite gradients except far in tails of posterior, but in this case the non-finite gradients are happening towards the higher density. @mhollanders do you have a smaller model + data that breaks Pathfinder?

mhollanders · May 24, 2024, 7:11am

Hey Aki, I just had a go at simulation just a basic regression with several nested random effects but it works well, so I’m afraid I don’t have an example. All I know is that in my model, it happens with all configurations (SEM vs. MVN) once the site-level random effects are added.

avehtari · May 24, 2024, 8:43am

It also seems that Pathfinder failing depends on the data, so that sometimes when I restart R, and rerun data generation, I do not get non-finite gradients.

This example did anyway help to fix two things in the initialization function to make it more robust (one of these is fic for NaN’s and Infs in lp__). I would guess the fix would be included in the cmdstanr point release on next Monday

JLC · May 25, 2024, 7:23pm

This doesn’t work for me and instead returns an error:

Error in sample.int(length(x), size, replace, prob) : 
  NA in probability vector

mhollanders · May 25, 2024, 9:50pm

Yeah, this is probably due to the same issue @avehtari is looking into.

avehtari · May 26, 2024, 2:51pm

@mhollanders and @JLC , can you try with the github version

remotes::install_github("stan-dev/cmdstanr")

If you do this in a running session, you need to unload and load cmdstanr and call cmdstan_model() again (or just restart R and run your script again). This fix should remove couple errors, but if all your draws have non-finite gradient then the initialization still doesn’t work,

@JLC can you tell more about your model and data, and even better if you can share them so that I can test myself.

mhollanders · May 27, 2024, 12:40am

Hey Aki, just confirming that I’m still getting lp__ = Inf, but I can use the Pathfinder object as initial values for HMC.

JLC · May 27, 2024, 1:42pm

Thanks @avehtari

The github version gives me a different problem, which might be due to me being new to Pathfinder. When running Pathfinder I do see a lot of:

Error evaluating model log probability: Non-finite gradient.

When trying to use the pathfinder result for init, I’m get:

Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=peak_z; dims declared=(1,175); dims found=(175)

My data list declares an array [175] for peak_z.

If I don’t use the pathfinder object for the init, everything samples fine. I’m not sure where the 1,175 declaration is originating.

My model is a piecewise regression of viral trajectory:

functions {              
        
        // mufun returns mean Ct:
        real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
                
          // Proliferation 
         if(t <= peaktime)
         
              return(peak + pslope * t);
              
            // Clearance
            else 
            
              return(peak + (pslope * peaktime + cslope * (t - peaktime)));
            
          }
          
        // durfun returns stage (prolieration and clearance) durations:
        real durfun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
                
          // Proliferation 
         if(t <= peaktime)
         
              return(abs((lod - peak)/pslope));
              
            // Clearance
            else 
            
              return(abs((lod -  peak - pslope * peaktime + cslope * peaktime)/cslope));
            
          }
          
         // stagefun returns stage (prolieration and clearance) indicators:
         real stagefun(real t, real peaktime){
          
         if(t <= peaktime)
         
              return(1); // Proliferation 
            
            else 
            
              return(2); // Clearance
            
          }
        
}

data {
        
  int<lower=1> N; // total number of observations
  vector[N] Y; // observed Ct
  real<lower=0> lod;    // limit of detection
  array[N] real lb; // upper truncation bounds
  array[N] real ub; // upper truncation bounds
  int<lower=1> K_pslope; // number of proliferation slope predictors
  matrix[N, K_pslope] X_pslope; // proliferation slope design matrix
  int<lower=1> K_cslope; // number of clearance slope predictors
  matrix[N, K_cslope] X_cslope; // clearance slope design matrix
  int<lower=1> K_peak; // number of peak predictors
  matrix[N, K_peak] X_peak; // peak design matrix
  int<lower=1> K_peaktime; // number of peak time predictors
  matrix[N, K_peaktime] X_peaktime; // peak time design matrix
  array[N] int t; // time 
  int<lower=1> nSubj; // number of subjects
  array[N] int<lower=1> Subj; // subject indicator
  
  }

parameters {
        
  vector[K_pslope] b_pslope; // proliferation slope
  vector[K_cslope] b_cslope; // clearance slope
  vector[K_peak] b_peak; // peak Ct
  vector[K_peaktime] b_peaktime; // time of peak Ct
  real<lower=0> sigma; // variance
  vector<lower=0>[1] peak_sd; // subject-level peak Ct varfiance
  array[1] vector[nSubj] peak_z; // individual peak Ct effects
  vector<lower=0>[1] peaktime_sd; // subject-level peak Ct time variance
  array[1] vector[nSubj] peaktime_z; // individual peak Ct time effects
  
}

transformed parameters {
        
    vector[nSubj] subj_peak; // subject-specific peak Ct offsets
    vector[nSubj] subj_peaktime; // subject-specific peak Ct time offsets
    subj_peak = peak_sd[1] * peak_z[1];
    subj_peaktime = peaktime_sd[1] * peaktime_z[1];
        
    vector[N] pslope = X_pslope * b_pslope;
    vector[N] cslope = X_cslope * b_cslope;
    vector[N] peak = X_peak * b_peak + subj_peak[Subj];
    vector[N] peaktime = X_peaktime * b_peaktime + subj_peaktime[Subj];
    vector[N] mu;
    
     // Calculate location (mu) 
    
     for (i in 1 : N) {

     mu[i] = mufun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);

    }
  
}

model {
        
  real lprior = 0; 
  
  // likelihood
   target += normal_lupdf(Y | mu, sigma)
                - normal_lcdf(ub | mu, sigma);

  // priors 
  lprior += normal_lupdf(b_pslope | 0, 1);
  lprior += normal_lupdf(b_cslope | 0, 1);
  lprior += normal_lupdf(b_peak[1] | 18, 10);
  lprior += normal_lupdf(b_peak[2] | 18, 10);
  lprior += normal_lupdf(b_peak[3] | 18, 10);
  lprior += normal_lupdf(b_peak[4:] | 0, 5);
  lprior += normal_lupdf(b_peaktime | 0, 0.25);
  lprior += normal_lupdf(sigma | 0, 4);
  lprior += normal_lupdf(peak_sd | 0, 10);
  lprior += normal_lupdf(peaktime_sd | 0, 3);
  target += lprior;
  target += std_normal_lupdf(peak_z[1]);
  target += std_normal_lupdf(peaktime_z[1]);
  
}

generated quantities {
        
    vector[N] log_lik;
    vector[N] ct_pred;
    vector[N] Duration;
    vector[N] Stage;
    
     for (i in 1 : N) {
             
                //Log likelihood
                log_lik[i] = normal_lpdf(Y[i] | mu[i], sigma)
                - normal_lcdf(ub[i] | mu[i], sigma);
                
                // Predicted Ct's
              //  ct_pred[i] = normal_lub_rng(mu[i], sigma, 0.0, 37.0);
                
                // Calculate durations by row
                Duration[i] = durfun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);
                        
                // Create stage indicator per row       
                Stage[i] = stagefun(t[i],peaktime[i]);
                
                }
                
}

avehtari · May 27, 2024, 4:59pm

ping @andrjohns for another dimension mismatch with

array[1] vector[nSubj] peak_z;

avehtari · May 27, 2024, 4:59pm

@JLC thanks for the additional information and the code, this is helpful

StaffanBetner · June 10, 2024, 12:56pm

Test with the newest development version. I had the same error for a model generated with brms, but it works now. This is the fix: Fixes 975 by only removing leftmost array dimension if equal to 1 by SteveBronder · Pull Request #993 · stan-dev/cmdstanr · GitHub

Bob_Carpenter · July 19, 2024, 9:51pm

This sounds like it may be related to issues people have reported with non-finite gradients for Pathfinder.

Topic		Replies	Views
Checking if sampling uses initial values? CmdStan	4	328	March 10, 2024
Using pathfinder to initalize sampling Modeling	11	231	November 19, 2024
Initializing larger models using samples from smaller models Interfaces techniques , cmdstanr	1	439	August 24, 2023
Provinding Pathfinder with Initial Values Modeling	2	151	March 29, 2024
Initialization CmdStan fitting-issues , cmdstanr	1	407	October 19, 2023

Using Pathfinder or other method to set initial values for sampling

Related topics