Exceedingly small stepsize following warmup

Hi Stan Users,

I’m attempting to apply Stan to sampling a nonlinear inverse problem (below) and am running into some issues during warmup. Specifically, my stepsize drops to a super low value during warmup (e.g. 1e-5), leading to high autocorrelation and poor mixing. Skipping adaptation and manually setting the stepsize to something a bit bigger (~ 0.01) produces fairly accurate posterior distributions for a synthetic dataset, but creates a number of divergences (somewhere around 15%).

My intuition is that there must be some pathological areas in my posterior which are forcing the warmup to take a small stepsize, but I haven’t been able to figure out which parameters are the main culprits (and how to reparameterize them if needed). I’ve tried non-centered parameterizations for all of the terms in the below model but keep running into the same issue. Does anyone have any ideas for things I can modify to help improve the warmup/sampling? I’ve attached a pairs plot from a model run with a bigger stepsize and included my model below.

Thanks for the help!

Cheers,
Dorran

Problem:
- Basic Tarantola (2005) style inverse problem where we have some observed data, d_o, that is related to our model parameters by some nonlinear function, g, such that g(m) = d. In this case our data is observations of surface uplift rate, and our function g(m) takes the geometry of a geologic fault, m, and predicts the rate at which the surface would be uplifted for that geometry given a known horizontal velocity.

Model:

// Not including the forward model function "predict_uplift", since it's a bit long
data{
  int nobs;
  int nx;
  int n_segments;
  
  vector[nx] x;
  vector[nx] z;
  vector[nobs] x_obs;
  vector[nobs] z_obs;
  
  vector[nobs] u;
  
  vector[1] gamma_guess;
  
  real min_theta;
  real max_theta;
  
  real theta_prior_scale;
  real theta_prior_location;
  
  real convergence;
  real x1u;
  real z1u;

}
transformed data {
  real x_r_placeholder[1];
  int x_i_placeholder[1];
  real x_max;
  
  // Model Width
  x_max = max(x);
  
  // Placeholders for required inputs for the algebra solver 
  x_r_placeholder[1] = 0.0;
  x_i_placeholder[1] = 1;
  
}
parameters {
  // The model domain for our fault system looks something like:
  //
  //Data:           ^    Rates of Uplift (mm/yr)
  //      |           ^ | 
  //      |    ^      | |           ^      ^                                                                
  //   X-o------------------------------------------------------------X (x=max(x),d=0) (earth's surface)   
  //   |  \ segment 1 w/ inclination theta[1], length[1]                                                 
  //   |   \                                                          
  //   |    \   theta[2], length[2]           
  //   |     o-----o                                                  
  //   |              \ theta[3], length[3]                     <---- motion along fault surface (mm/yr)
  //   |               \                                                
  //   |                \   theta[4], length[4] |
  //   |                 o----------------------------------------------o fault
  //   x                                                              
  // Our model has three sets of parameters
  // 1. theta -> Dip Angles (radians) of each fault segment
  // 2. w -> horizontal length of each fault segment (defined as a fraction of our model length)
  // 3. Modelization Uncertainty
  
  vector[n_segments] theta;
  simplex[n_segments] w_raw;
  real<lower=0> sigma_modelization_raw;
  
}
transformed parameters {
    vector[n_segments] w;
    vector[n_segments] lengths;
    
    real<lower=0> sigma_modelization;
    
    // Scale widths from fraction to true distances
    w = w_raw * x_max;
    // Compute fault length from fault dip and widths
    lengths = w ./ cos(theta);
    // Non-centered parameterization adjustments
    sigma_modelization = sigma_modelization_raw*5;
}
model {
    vector[nobs] upred;
    // Run forward model calculation to convert parameters to our observed data (uplift rates)
    upred = predict_uplift(x,z,x_obs,z_obs,nobs,n_segments,x1u,z1u,convergence,
                  theta,w,gamma_guess,x_r_placeholder,x_i_placeholder);
    upred = upred*1000; // m/yr -> mm/yr conversion
    
    // ----------------------- MODEL PRIORS -------------------------------
    // Inclination of each segment cannot exceed a range of [-1deg,29deg], otherwise our forward calculation will crash and return -inf log-probability
    for (i in 1:n_segments) {
      theta[i] ~ normal(theta_prior_location, theta_prior_scale) T[min_theta,max_theta];
    }
    target += dirichlet_lpdf(w_raw | rep_vector(1,n_segments));
    target += normal_lpdf(sigma_modelization_raw | 0,1);
    // ----------------------- Log-Likelihood -----------------------------  
    target += normal_lpdf(u | upred,sigma_modelization);
}
'''

Couldn’t get rdump to work properly, but I’m attaching an R script and the relevant model + data csvs for reference.

fault_bend_geometry_synthetic.stan (8.5 KB)
load_example_data.R (2.3 KB)
SyntheticUpliftCentroids.csv (5.1 KB)
SyntheticUpliftRefLine.csv (34.9 KB)

Some suggestions (which may / may not be useful):

  • Include the lower / upper bounds for theta in the declaration: vector <lower = min_theta, upper = max_theta> [n_segments] theta.
  • The w / w_raw seem to have some kind of identifiability issues. Do you have any kind of prior information that could be included? (i.e. the first / last horizontal segments are likely to be larger than the others). I would suspect the small step size is needed to appropriately explore the posterior of w_raw in the current formulation.
  • Are you testing this on a data set where you know the (true) value of n_segments? If not, you might be fitting a model with too many segments, and the resulting w_raw parameters are impossible to identify.