Trying to do hierarchical nonlinear model in Stan; it's slow and probably wrong

I’ve been trying to run a model that works more or less like this. I have a set of stress-strain curves, where the data for each curve is collected under a fixed strain rate and temperature. Each individual curve is supposed to fit a model like this:

stress = A + B*strain^n

A and B are supposed to vary with strain rate and temperature according to a very nonlinear relationship. n is not, but realistically it does anyway. A and B are of the same order of magnitude as the stress values, usually on the order of a thousand megapascals. n should between 0 and 1, i.e. the slope of the curves should decrease with increasing strain. I’ve tried a Stan model that looks mostly like this:

functions {
    vector notRealFunction(vector otherParams, real strainRate, real temperature) {...}
}

data {
  int<lower=1> numCurves;
  int<lower=0> curveSizes[numCurves];
  int<lower=1> numOtherParams;
  
  vector[numCurves] strainRate;
  vector[numCurves] temperature;

  vector[sum(curveSizes)] strain;
  vector[sum(curveSizes)] stress;
}

 parameters {
  real<lower=0.0> A;
  real<lower=0.0> B;
  real<lower=0.0, upper=1.0> n;
  vector<lower=0.0>[numCurves] SD_curve;
  vector[numOtherParams] otherParams;

  matrix[3,numCurves] unit_ABn;
  vector<lower=0.0>[3] SD_ABn;
  cholesky_factor_corr[3] LKJ_chol_ABn;
}

transformed parameters {
  matrix[3,numCurves] ABn_indiv;

  {
    matrix[3,numCurves] ABn_mean;

    for (curveInd in 1:numCurves) {          
      ABn_mean[1, curveInd] = A*notRealFunction(otherParams, strainRate[curveInd], temperature[curveInd]);
      ABn_mean[2, curveInd] = B*notRealFunction(otherParams, strainRate[curveInd], temperature[curveInd]);
      ABn_mean[3, curveInd] = n; 
    }

    ABn_indiv = ABn_mean + diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn;
  }
}

model {
  int startInd = 1;

  to_vector(unit_ABn) ~ normal(0,1);
  LKJ_chol_ABn ~ lkj_corr_cholesky(2);

  A ~ normal(1000.0, 333.3)T[0.0,];
  B ~ normal(1000.0, 333.3)T[0.0,];

  for (i in 1:2) {
    SD_ABn[i] ~ normal(100.0, 33.3)T[0.0,];
  }
  SD_ABn[3] ~ normal(0.5, 0.5/3)T[0.0,];

  for (curveInd in 1:numCurves) {

    int endInd = startInd + plasticCurveSizes[curveInd] - 1;
    
    real A_indiv = ABn_indiv[1, curveInd];
    real B_indiv = ABn_indiv[2, curveInd];
    real n_indiv = ABn_indiv[3, curveInd];

    SD_curve[curveInd] ~ normal(10.0, 3.33)T[0.0,];
       
    for (i in startInd:endInd) {
      sigmaPlastic[i] ~ normal(A_indiv + B_indiv*(epsilonPlastic[i])^n_indiv,
                               SD_curve[curveInd]);      
    }

    startInd = endInd + 1;
  }
  
}

I find that if I just run this with default parameters, I get lots of divergent transitions, lots of hitting the maximum tree depth, and Rhats very far from one. If I see adapt_delta to 0.9999 and max_treedepth to 20, I get seemingly sane results on simulated data, but it runs kinda slow (about an hour).

The folk theorem of statistical computing is telling me that something is probably wrong, but I’m not sure where to start. I thought of normalizing the data, but that’s far more straightforward for the stress than the strain, since the latter is raised to a power (and that doesn’t even address the issues with normalizing the strain rate and temperature). I’m also not sure about the LKJ prior, since so far I’ve only seen it used in the context of linear regression. And of course there’s the possibility that I’ve just mangled something else entirely.

Does anything jump out as dodgy or wrong? Or is this even a good approach at all?

A and B are supposed to vary with strain rate and temperature according to a very nonlinear relationship

For each curveInd, it’s a different set of strain rates and temperatures. Is the notRealFunction a function that has to be fixed to one form? Or is there a chance to model notRealFunction non-parametrically?

Does a fit kinda work for just a single curve? And the problem comes about when bringing a bunch of curves together?

Non-parametric isn’t a real option here. The choice of notRealFunction() is based upon a pre-existing material model.

I haven’t tried a single curve, but I’d previously tried a variation where A, B, and n weren’t modeled as being correlated, and that seemed to be significantly faster, but probably less correct.

I get seemingly sane results on simulated data

That’s encouraging, at least. Just sidestepping the fixed model issue, probably just start with pairplots and try to nail down exactly what parameters are causing divergences.

There’s a big section on doing it with Bayesplot: Visual MCMC diagnostics using the bayesplot package

Even if your models aren’t Rstan, you can load them in with read_stan_csv (fit = read_stan_csv(c('my_stan_fit.csv')))

And feed Stan fits into Bayesplot through as.array (so like mcmc_pairs(as.array(fit)))

This is Betancourts post on divergences which is pretty cool: Diagnosing Biased Inference with Divergences

1 Like

I meant to mention also to make sure and try out the mcmc_parcoord stuff. It’s from this thread Concentration of divergences. It might be easier to find out what’s happening if you have a ton of parameters.

I tried mcmc_parcoord and nothing really stood out, even after normalizing the parameters to put them on the same scale. One catch is that I have about 70 parameters, mostly due to ABn_indiv and unit_ABn, so a lot of plots are not very useful, especially the pairs plots. I tried using the stan_checkdivergences.R function at Concentration of divergences thread, but that doesn’t work with saved stanfit objects, due to an issue discussed on the old Google Group for Stan.

I was hoping that I had done something that would stand out to someone as obviously flawed, but it looks like things won’t be that easy.

Hi,

If you can get out the divergences then you could manually calculate the normalized values (e.g mean 0, sd 1 for all data) and calculate mean and sd for divergent values.

See if there are values that stand out either with mean or sd.

Wonder if normalizing data to 0, 1 with inverse cdf (to uniform) would help?

something that would stand out to someone as obviously flawed

(edit: was gonna comment on this then didn’t then forgot to remove the quote, so non-sequitur to start)

Does the LKJ prior on noise between A, B, n change the results much? I’m not sure that:

A, B, n ~ multi_normal(f(strain rate of curve, temperature of curve), Sigma);

is preferable to

A ~ normal(f(strain rate of curve, temperature of curve), sigma_A);
B ~ normal(f(strain rate of curve, temperature of curve), sigma_B);

Unless you know that there is some process in the background linking (A, B, n) all together that you’re trying to work in here.

Do the fits:

for (i in startInd:endInd) {
  sigmaPlastic[i] ~ normal(A_indiv + B_indiv*(epsilonPlastic[i])^n_indiv, SD_curve[curveInd]);
}

Work for each curve pretty well?

I took the opposite tack. I originally treated A, B, and n as individually normally distributed, but it occurred to me that was too strong an assumption.

BTW, I checked and found that the parameters for the individual curves do fit quite well.

I tried that, and I’m not sure that I see anything that really stands out. The means of the normalized divergent parameters definitely stray from zero, usually with values around 0.1 to 0.5 or so, but I don’t see any particular parameters standing out.

BTW, I checked and found that the parameters for the individual curves do fit quite well.

Nice, can you just run the fits for all the curves and then make plots of posterior estimates of A and B vs. strain rate and temperature?

Then maybe that’ll reveal what the issue might be with notRealFunction (cause I’m assuming there’s something weird going on there that’s making things funky).

Here are the plots I have so far:

A_fit_vs_A_calc
B_fit_vs_B_calc

These are what I get from comparing the means of the fits for the individual curves versus what they should be according to notRealFunction(), The solid line is, of course, has a slope of one and intercept of zero. Looks like things seem to line up reasonably.

Looks like things seem to line up reasonably.

Yeah, it does, so maybe this is an unidentifiability thing? It sounds like the answer is there – just the sampler isn’t finding it unless it looks really hard (max_treedepth = 20).

Maybe there’s an issue between A and the parameters of notRealFunction (or B)? Or maybe its part of the otherParams thing?

Maybe look at pairplots of A, B, and otherParams for one of the max_treedepth = 20 runs and see if there’s anything super correlated?

I’d also compare a couple ABn_indivs to ABn_means (just pick a couple curves and make pairplots of the reduced 6 parameters).

I tried normalizing the stress today, not between 0 and 1, but doing the usual trick of subtracting the mean and dividing by the standard deviation. If I ran RStan with the default sampler parameters, I got pretty much the same answers as before, as well as lots of divergent transitions and hitting the maximum tree depth. There doesn’t seem to be any improvement in numerical stability.

I changed how I generated my simulated data so that it was generated from an explicit unit_ABn and LKJ_chol_ABn, and while I still get other parameters to have sane results, I don’t even approximately reproduce the original unit_ABn and LKJ_chol_ABn that I had before. The mean unit_ABn and LKJ_chol_ABn that I get from RStan don’t even have close to the same matrix product as the original. Is that expected or is that the unidentifiability issue you were mentioning?

What I was meaning by unidentifiability is if you were in a situation where you needed to solve 1 = a * b for a and b. Even though you have two parameters, the solutions are on curves.

And then looking at:

ABn_mean[1, curveInd] = A*notRealFunction(otherParams...)...

Maybe the likelihood pushes ABn_mean[1, curveInd] to be a very specific thing, and then maybe there’s a way to satisfy this by either wiggling A or wiggling the parameters in notRealFunction, and this could create a thin little ridge of solutions.

Might not be happening though. The way to figure this out is to look for a ridge in the posteriors of A and the values of notRealFunction (save them in a temporary). (or what I was suggesting above was A and otherParameters, but it might be easier to look at notRealFunction directly).

Maybe this latest experiment is showing that there’s some sort of unidentifiability in:

ABn_indiv = ABn_mean + diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn;

in the sense of 1 = a + b, solve for a and b. The way the model is written here, things could either come from ABn_mean, or this multivariate normal.

But this isn’t really super tight logic for hunting down problems cause you could roll it out anytime there was multiplication or addition in a model, haha. Gotta have those pairplots. Just save diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn as a transformed parameter and then look in pairplots for correlations with ABn_mean. But I’d start by just getting rid of the multivariate stuff and start by looking at A/notRealFunction.

Bear in mind how new I am to Bayesian regression. What does it mean to look for a “ridge” in this context?

Like Figure 24.1 in the 2.16 manual. I’d call both the pictures ridges, but the one on the left is super bad.

This is an informal thing, but I meant ridge in the log posterior in the same way that a mountain could have a ridge. This is in comparison to if your log posterior has a single peak (or if you mountain had just a big peak as well).

If you have a ridge like this, then you have identifiability problems (lots of different parameterizations that produce reasonable results). Unlike 24.1, it’d be easy to have a curvy ridge too.

That sort of helps, but I’m not so sure how to get useful plots like those in Figure 24.1. I can easily get 3D scatterplots of lp__ versus a couple other parameters. With the akima package, I can get interpolations to feed into R’s filled.contour() function, but the results look wonky and it’s not too clear if that’s from the interpolation or the parameters. Here’s what I get, for what it’s worth:

Don’t worry about plotting the actual densities. Just do pairplots of parameter 1 vs. parameter 2. The ridge will become visible (edit: if it’s actually there) cause there are (edit: would be) a bunch of posterior samples there. If fit is a stanfit object,

pairs(fit, pars = c("otherParams", "A")) should work

Offhand, nothing is jumping out at me as being blatantly ridge-like: