Do you have a smallish model that shows this behavior? I can look on Monday but should easier to debug with a test that shows the bad behavior
Using @mhollanders’s (big) model is the first one with which I see the problem. I don’t rememeber seeing non-finite gradients except far in tails of posterior, but in this case the non-finite gradients are happening towards the higher density. @mhollanders do you have a smaller model + data that breaks Pathfinder?
Hey Aki, I just had a go at simulation just a basic regression with several nested random effects but it works well, so I’m afraid I don’t have an example. All I know is that in my model, it happens with all configurations (SEM vs. MVN) once the site-level random effects are added.
It also seems that Pathfinder failing depends on the data, so that sometimes when I restart R, and rerun data generation, I do not get non-finite gradients.
This example did anyway help to fix two things in the initialization function to make it more robust (one of these is fic for NaN’s and Infs in lp__
). I would guess the fix would be included in the cmdstanr
point release on next Monday
This doesn’t work for me and instead returns an error:
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
Yeah, this is probably due to the same issue @avehtari is looking into.
@mhollanders and @JLC , can you try with the github version
remotes::install_github("stan-dev/cmdstanr")
If you do this in a running session, you need to unload and load cmdstanr
and call cmdstan_model()
again (or just restart R and run your script again). This fix should remove couple errors, but if all your draws have non-finite gradient then the initialization still doesn’t work,
@JLC can you tell more about your model and data, and even better if you can share them so that I can test myself.
Hey Aki, just confirming that I’m still getting lp__ = Inf
, but I can use the Pathfinder object as initial values for HMC.
Thanks @avehtari
The github version gives me a different problem, which might be due to me being new to Pathfinder. When running Pathfinder I do see a lot of:
Error evaluating model log probability: Non-finite gradient.
When trying to use the pathfinder result for init, I’m get:
Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=peak_z; dims declared=(1,175); dims found=(175)
My data list declares an array [175] for peak_z.
If I don’t use the pathfinder object for the init, everything samples fine. I’m not sure where the 1,175 declaration is originating.
My model is a piecewise regression of viral trajectory:
functions {
// mufun returns mean Ct:
real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
// Proliferation
if(t <= peaktime)
return(peak + pslope * t);
// Clearance
else
return(peak + (pslope * peaktime + cslope * (t - peaktime)));
}
// durfun returns stage (prolieration and clearance) durations:
real durfun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
// Proliferation
if(t <= peaktime)
return(abs((lod - peak)/pslope));
// Clearance
else
return(abs((lod - peak - pslope * peaktime + cslope * peaktime)/cslope));
}
// stagefun returns stage (prolieration and clearance) indicators:
real stagefun(real t, real peaktime){
if(t <= peaktime)
return(1); // Proliferation
else
return(2); // Clearance
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // observed Ct
real<lower=0> lod; // limit of detection
array[N] real lb; // upper truncation bounds
array[N] real ub; // upper truncation bounds
int<lower=1> K_pslope; // number of proliferation slope predictors
matrix[N, K_pslope] X_pslope; // proliferation slope design matrix
int<lower=1> K_cslope; // number of clearance slope predictors
matrix[N, K_cslope] X_cslope; // clearance slope design matrix
int<lower=1> K_peak; // number of peak predictors
matrix[N, K_peak] X_peak; // peak design matrix
int<lower=1> K_peaktime; // number of peak time predictors
matrix[N, K_peaktime] X_peaktime; // peak time design matrix
array[N] int t; // time
int<lower=1> nSubj; // number of subjects
array[N] int<lower=1> Subj; // subject indicator
}
parameters {
vector[K_pslope] b_pslope; // proliferation slope
vector[K_cslope] b_cslope; // clearance slope
vector[K_peak] b_peak; // peak Ct
vector[K_peaktime] b_peaktime; // time of peak Ct
real<lower=0> sigma; // variance
vector<lower=0>[1] peak_sd; // subject-level peak Ct varfiance
array[1] vector[nSubj] peak_z; // individual peak Ct effects
vector<lower=0>[1] peaktime_sd; // subject-level peak Ct time variance
array[1] vector[nSubj] peaktime_z; // individual peak Ct time effects
}
transformed parameters {
vector[nSubj] subj_peak; // subject-specific peak Ct offsets
vector[nSubj] subj_peaktime; // subject-specific peak Ct time offsets
subj_peak = peak_sd[1] * peak_z[1];
subj_peaktime = peaktime_sd[1] * peaktime_z[1];
vector[N] pslope = X_pslope * b_pslope;
vector[N] cslope = X_cslope * b_cslope;
vector[N] peak = X_peak * b_peak + subj_peak[Subj];
vector[N] peaktime = X_peaktime * b_peaktime + subj_peaktime[Subj];
vector[N] mu;
// Calculate location (mu)
for (i in 1 : N) {
mu[i] = mufun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);
}
}
model {
real lprior = 0;
// likelihood
target += normal_lupdf(Y | mu, sigma)
- normal_lcdf(ub | mu, sigma);
// priors
lprior += normal_lupdf(b_pslope | 0, 1);
lprior += normal_lupdf(b_cslope | 0, 1);
lprior += normal_lupdf(b_peak[1] | 18, 10);
lprior += normal_lupdf(b_peak[2] | 18, 10);
lprior += normal_lupdf(b_peak[3] | 18, 10);
lprior += normal_lupdf(b_peak[4:] | 0, 5);
lprior += normal_lupdf(b_peaktime | 0, 0.25);
lprior += normal_lupdf(sigma | 0, 4);
lprior += normal_lupdf(peak_sd | 0, 10);
lprior += normal_lupdf(peaktime_sd | 0, 3);
target += lprior;
target += std_normal_lupdf(peak_z[1]);
target += std_normal_lupdf(peaktime_z[1]);
}
generated quantities {
vector[N] log_lik;
vector[N] ct_pred;
vector[N] Duration;
vector[N] Stage;
for (i in 1 : N) {
//Log likelihood
log_lik[i] = normal_lpdf(Y[i] | mu[i], sigma)
- normal_lcdf(ub[i] | mu[i], sigma);
// Predicted Ct's
// ct_pred[i] = normal_lub_rng(mu[i], sigma, 0.0, 37.0);
// Calculate durations by row
Duration[i] = durfun(t[i], peak[i], peaktime[i], pslope[i], cslope[i], lod);
// Create stage indicator per row
Stage[i] = stagefun(t[i],peaktime[i]);
}
}
ping @andrjohns for another dimension mismatch with
array[1] vector[nSubj] peak_z;
@JLC thanks for the additional information and the code, this is helpful
Test with the newest development version. I had the same error for a model generated with brms, but it works now. This is the fix: Fixes 975 by only removing leftmost array dimension if equal to 1 by SteveBronder · Pull Request #993 · stan-dev/cmdstanr · GitHub
This sounds like it may be related to issues people have reported with non-finite gradients for Pathfinder.