# Exceedingly small stepsize following warmup

Hi Stan Users,

I’m attempting to apply Stan to sampling a nonlinear inverse problem (below) and am running into some issues during warmup. Specifically, my stepsize drops to a super low value during warmup (e.g. 1e-5), leading to high autocorrelation and poor mixing. Skipping adaptation and manually setting the stepsize to something a bit bigger (~ 0.01) produces fairly accurate posterior distributions for a synthetic dataset, but creates a number of divergences (somewhere around 15%).

My intuition is that there must be some pathological areas in my posterior which are forcing the warmup to take a small stepsize, but I haven’t been able to figure out which parameters are the main culprits (and how to reparameterize them if needed). I’ve tried non-centered parameterizations for all of the terms in the below model but keep running into the same issue. Does anyone have any ideas for things I can modify to help improve the warmup/sampling? I’ve attached a pairs plot from a model run with a bigger stepsize and included my model below.

Thanks for the help!

Cheers,
Dorran

Problem:
- Basic Tarantola (2005) style inverse problem where we have some observed data, d_o, that is related to our model parameters by some nonlinear function, g, such that g(m) = d. In this case our data is observations of surface uplift rate, and our function g(m) takes the geometry of a geologic fault, m, and predicts the rate at which the surface would be uplifted for that geometry given a known horizontal velocity.

Model:

``````// Not including the forward model function "predict_uplift", since it's a bit long
data{
int nobs;
int nx;
int n_segments;

vector[nx] x;
vector[nx] z;
vector[nobs] x_obs;
vector[nobs] z_obs;

vector[nobs] u;

vector[1] gamma_guess;

real min_theta;
real max_theta;

real theta_prior_scale;
real theta_prior_location;

real convergence;
real x1u;
real z1u;

}
transformed data {
real x_r_placeholder[1];
int x_i_placeholder[1];
real x_max;

// Model Width
x_max = max(x);

// Placeholders for required inputs for the algebra solver
x_r_placeholder[1] = 0.0;
x_i_placeholder[1] = 1;

}
parameters {
// The model domain for our fault system looks something like:
//
//Data:           ^    Rates of Uplift (mm/yr)
//      |           ^ |
//      |    ^      | |           ^      ^
//   X-o------------------------------------------------------------X (x=max(x),d=0) (earth's surface)
//   |  \ segment 1 w/ inclination theta[1], length[1]
//   |   \
//   |    \   theta[2], length[2]
//   |     o-----o
//   |              \ theta[3], length[3]                     <---- motion along fault surface (mm/yr)
//   |               \
//   |                \   theta[4], length[4] |
//   |                 o----------------------------------------------o fault
//   x
// Our model has three sets of parameters
// 1. theta -> Dip Angles (radians) of each fault segment
// 2. w -> horizontal length of each fault segment (defined as a fraction of our model length)
// 3. Modelization Uncertainty

vector[n_segments] theta;
simplex[n_segments] w_raw;
real<lower=0> sigma_modelization_raw;

}
transformed parameters {
vector[n_segments] w;
vector[n_segments] lengths;

real<lower=0> sigma_modelization;

// Scale widths from fraction to true distances
w = w_raw * x_max;
// Compute fault length from fault dip and widths
lengths = w ./ cos(theta);
sigma_modelization = sigma_modelization_raw*5;
}
model {
vector[nobs] upred;
// Run forward model calculation to convert parameters to our observed data (uplift rates)
upred = predict_uplift(x,z,x_obs,z_obs,nobs,n_segments,x1u,z1u,convergence,
theta,w,gamma_guess,x_r_placeholder,x_i_placeholder);
upred = upred*1000; // m/yr -> mm/yr conversion

// ----------------------- MODEL PRIORS -------------------------------
// Inclination of each segment cannot exceed a range of [-1deg,29deg], otherwise our forward calculation will crash and return -inf log-probability
for (i in 1:n_segments) {
theta[i] ~ normal(theta_prior_location, theta_prior_scale) T[min_theta,max_theta];
}
target += dirichlet_lpdf(w_raw | rep_vector(1,n_segments));
target += normal_lpdf(sigma_modelization_raw | 0,1);
// ----------------------- Log-Likelihood -----------------------------
target += normal_lpdf(u | upred,sigma_modelization);
}
'''``````

Couldn’t get rdump to work properly, but I’m attaching an R script and the relevant model + data csvs for reference.

fault_bend_geometry_synthetic.stan (8.5 KB)
• Include the lower / upper bounds for `theta` in the declaration: `vector <lower = min_theta, upper = max_theta> [n_segments] theta`.
• The `w` / `w_raw` seem to have some kind of identifiability issues. Do you have any kind of prior information that could be included? (i.e. the first / last horizontal segments are likely to be larger than the others). I would suspect the small step size is needed to appropriately explore the posterior of `w_raw` in the current formulation.
• Are you testing this on a data set where you know the (true) value of `n_segments`? If not, you might be fitting a model with too many segments, and the resulting `w_raw` parameters are impossible to identify.