Dealing with uncertain parameters that aren't purely fitting parameters

I’m using Stan for what I suspect may be an unusual purpose: fitting a material behavior model to stress-strain curve data that I got from a co-worker. The reason I’m using Stan rather than just doing least-squares fitting is that I’m also trying to get probability distribution functions for the parameters of the model.

Now this material model has several fitting parameters, but a few of them aren’t “pure” fitting parameters, but rather have physical meaning: namely the yield strength, Young’s modulus, and melting temperature. I could treat those parameters purely as data, but the catch is that there is some uncertainty in those parameters; they may vary from one batch of material to another. Worse, the information on the yield strength and elastic modulus is somewhat spotty, with a fairly wide spread of values.

So far, what I’ve been doing is assigning more-or-less plausible lower and upper bounds for these parameters, and leaving the prior distribution for the parameters as the default (uniform). Unfortunately, if I use “soft” constraints, such as setting the prior distribution for a parameter to a normal distribution centered around a plausible mean value of a parameter, the resulting probability distributions for those parameters are decidedly improbable. However, I’m not sure that using hard upper and lower bounds is a good choice either. Feels like there’s a better option that I’m just missing due to lack of experience with Bayesian data analysis.

Any idea how to deal with these particular kinds of uncertain parameters?

1 Like

I think you were on the right track with soft constraints. If a normal prior won’t hold your coefficients to within +/-3 * standard deviation your model likely is not fitting very well. I would check model fit and then go back to rethinking the model.

One catch with that conclusion is that some of the parameters probably can’t be determined very well from the stress-strain data, regardless of model. Some issues:

  • The material model has equivalent plastic strain as one of its independent variables. (The others are plastic strain rate and temperature.) The elastic modulus is only needed to convert from so-called true strain, which is the data that I have, to the equivalent plastic strain. Basically, “plastic strain” = “true strain” - (“true stress”)/(“elastic modulus”).

  • The melting temperature can’t reliably be estimated from the stress-strain data. Basically, I have several stress-strain curves, each one for a given strain rate and temperature. Practically speaking, for the material in question, a structural metal, there’s not a good way to get stress-strain data for the material if its too close to its melting temperature. At that temperature, the testing equipment would meit.

  • The yield strength, ideally, should be determinable from the stress-strain data. However, the catch there is that its determination in practice is somewhat slippery. Often, the yield strength is estimated by taking a line tangent to a stress-strain curve at the origin, offsetting it horizontally by 0.2% strain (where strain is the horizontal axis), and declaring the yield strength to be the stress at which that offset line intersects the stress-strain curve. That seems to be the case with the material that I’m working with here.

Are you saying that your posterior ends up wider than your prior?

James! I do this stuff too! Or at least I try to do this stuff…

that I’m also trying to get probability distribution functions for the parameters of the model

Good stuff!

What materials system do you work on? The people I work with (UC Santa Barbara) are most interested in high temperature metals. Mostly superalloys (gas turbines) and such.

The best place to start on this would be what’s your actual model (is it okay to post the Stan code?) and what does your data look like?

Then we can figure out what makes sense. There’s plenty of room for these mechanics models to go wrong.

1 Like

One thing to keep in mind here is that the improbable values may indicate that the model is not adequate to fit the data. In this case the issue isn’t so much that the priors or wrong but rather that the mechanistic model is wrong or the observation model is wrong. In particular, assuming a simple gaussian likelihood around the mechanistic model output is rarely sufficient for complex data.

To figure out what’s going on you’ll want to follow our recommend Stan workflow.

  1. Make sure that you can simulate data from reasonable parameter values and then recover those values when you fit with the simulated data. Even better repeat this process multiple times, each time sampling true parameter values from the prior.

  2. Now apply the model to the data. If (1) succeeded but the model doesn’t fit the real data well then your model is definitely not capturing all of the structure of the measurement. How to proceed depends on the precise details of your problem.

  3. That said, make sure you’re actually getting an accurate fit by looking at all of our diagnostics. If you put informative normal priors around each parameter then it will be very hard for the posterior to expand beyond the breadth of the prior so you shouldn’t be getting extremely improbably values. This could be due to a model misfit issue as in (2) or it could just be that Stan is having trouble fitting the posterior.

sakrejda:

Are you saying that your posterior ends up wider than your prior?

No, the peak of the posterior just ends up very far from where it should be.

bbbales2:

What materials system do you work on?

I want to be cautious about saying too much for now. Basically, though, the material system itself is not that important except that I’m using it in simulations where I’m doing forward uncertainty quantification. I hope to use the results from Stan in the forward analysis. Also, if the material model that I’m using is a so-so fit – and I already have good reasons to expect that it is – I want to be able to quantify that.

is it okay to post the Stan code?

I don’t think I’d be giving away too much by doing that:

data {
   real Tmelt;
   real Troom;

   real<lower=0.0> A_lower;
   real<lower=0.0> A_upper;

   real<lower=0.0> B_guess_mean;
   real<lower=0.0> n_guess_mean;
   real<lower=0.0> C_guess_mean;
   real<lower=0.0> m_guess_mean;

   real<lower=0.0> E_lower;
   real<lower=0.0> E_upper;

   real<lower=0.0> stdDevSigma_guess_mean;

   int<lower=0> dataLen;
   vector[dataLen] epsilon;
   vector[dataLen] epsilon_dot;
   vector[dataLen] T;
   vector[dataLen] sigma;
}

parameters {
    real<lower=A_lower, upper=A_upper> A;
    real<lower=0.0> B;
    real<lower=0.0> n;
    real<lower=0.0> C;
    real<lower=0.0> m;
    real<lower=0.95*Tmelt,upper=1.05*Tmelt> Tmelt_param;

    real<lower=E_lower, upper=E_upper> E;
    real<lower=0.0> stdDevSigma;
}

model {

    // Note that I can only define local variables at the top of a block.
    real Tdiff = Tmelt_param - Troom;

    // Priors
    // Using 1/3 of the mean as a guess for the standard deviation so that
    // where the abscissa is zero, the prior distribution is close to zero.
    //
    B ~ normal(B_guess_mean, B_guess_mean/3)T[0.0,];
    n ~ normal(n_guess_mean, n_guess_mean/3)T[0.0,];
    C ~ normal(C_guess_mean, C_guess_mean/3)T[0.0,];
    m ~ normal(m_guess_mean, m_guess_mean/3)T[0.0,];
    stdDevSigma ~ normal(stdDevSigma_guess_mean, stdDevSigma_guess_mean/3)T[0.0,];

    for (i in 1:dataLen) {

       real plas_strain = epsilon[i] - sigma[i]/E;

       if (plas_strain > 0.0) {
          real Tstar = (T[i] - Troom)/Tdiff;

          sigma[i] ~ normal((A + B*plas_strain^n)*(1.0 + C*log(epsilon_dot[i]))*(1.0 - Tstar^m), stdDevSigma);
       }         
    }
}

So you’re saying you’re prior is SO wide that it includes absurd values. Don’t do that, it’s more trouble than it’s worth. It’s like estimating the distance from the center of the earth to the center of the moon and allowing a prior that says it’s so small that the moon is inside the earth. That’s not a weak prior!

1 Like

So you’re saying you’re prior is SO wide that it includes absurd values.

That seems to be a questionable interpretation, and one that is at odds with the advice here: Prior Choice Recommendations · stan-dev/stan Wiki · GitHub

In particular, it is at odds with this: “Don’t use uniform priors, or hard constraints more generally, unless the bounds represent true constraints (such as scale parameters being restricted to be positive, or correlations restricted to being between -1 and 1)” Technically, any soft constraint allows for “absurd values”; it just assigns them a very low prior probability.

I’m not saying use hard constraints, I’m saying absurd values should be in the tails of a light-tailed prior. If there’s a conflict you’ll see it. So if +/-3 is an absurd values, N(0,1) is a fine prior. If you need to you can relax from there.

I should say the term weak prior is acquiring another more specific meaning that I’m not really on board with but I’ll probably give up on the term sooner or later. Computationally what you want is absurd values in light tails.

1 Like

I’m saying absurd values should be in the tails of a light-tailed prior

I’m pretty sure that’s what I had done before. Just to be sure, I tried redoing the calculations by changing

real<lower=E_lower, upper=E_upper> E;

to

 real<lower=0.0> E;

and added

E ~ normal(0.5*(E_lower + E_upper), (E_upper - E_lower)/12)T[0.0,];

to the model section of the Stan file. If I’ve done my arithmetic right, the prior specifies E_lower and E_upper to be six standard deviations from the mean.

E_lower and E_upper are 190e3 and 210e3, respectively, but the posterior for E has a mean of about 1.7e3.

Ok, then there’s some really atrocious model misfit driving the mean down to such small values. At that point it’s not a question of how you specify the model, your model is just very wrong for the data as @betanalpha mentioned. This is easy to do with a mechanistic model and one way to solve it is to make the mechanistic pieces more flexible (Andrew posted a nice write-up of splines in Stan on his blog recently and that’s a great way of making a mechanistic model more flexible).

1 Like

Hmm, this model:

(A + B*plas_strain^n) * (1.0 + C*log(epsilon_dot[i])) * (1.0 - Tstar^m)

Could definitely live in misspecification land. All the power law fits and cross terms are a bit yucky, but this is my experience with how these curves are handled as well.

I think you implied earlier that some of these parameters are kinda not-interesting and I’m assuming they’re there just there to get different bits of a bigger curve to fit together. Does it make sense in this case to break up the creep curves and do different inferences on different bits of them? That might not be as cool as fitting things in one big swoop, but trying to get everything with one big model might just lead to pain (and if the model isn’t reflective of how the actual system works – incorrect answers!).

Do you have an example stress strain curve you could show and label the things you want from it (in terms of parameters of the model)? Just Googling an example one and labeling that is fine. I’m not even really competent at mechanics stuff. I always gotta clarify these terms.

What Michael said about simulating data from your model and trying to fit it might be a good idea here. There’s enough cross terms in this that I’d be worried about identifiability stuff. Have you looked at pairplots to see how correlated different parameters are? I wouldn’t look too much at the summary statistics until you’ve had a chance to investigate the correlations. Maybe the answer you think you should be getting is in there, but there’s also some unidentified thing that also means a bunch of other answers are in there.

@sakrejda Oh huh neat, I hadn’t seen the splines thing (I assume this is what you’re talking about http://andrewgelman.com/2017/05/13/splines-in-stan/)

1 Like

Does it make sense in this case to break up the creep curves and do different inferences on different bits of them?

FWIW, these aren’t creep curves. As far as whether it makes sense to break things up? Well, the original recommended way to do the fitting (following Johnson and Cook) was to fit B and n to the stress-strain curve for which epsilon_dot = 1 and Tstar = 0 (with A taken as the yield strength for epsilon_dot = 1), then fit C to all the curves where Tstar = 0, then fit m for all the curves. (Troom and Tmelt would be treated as certain.) The catch is I don’t see a good way to do this via Stan. I can easily see questionable ways of doing it, e.g. fitting C while fixing A, B, and n to mean values that had been determined from a previous round of curve fitting, but I suspect that would end up poorly accounting for the actual uncertainties in C. I’m also not so sure if that’s quite what you mean by breaking things up.

One thing crosses my mind about the low value that I get for the mean value of E if I don’t use hard constraints. The “if” statement in my Stan model entails that if E is small, then fewer elements of the stress-strain data may be used in the fitting process. It had occurred to me that if plas_strain <= 0.0, then I might want to let

sigma[i] ~ normal(E*epsilon[i], stdDevSigma);

but that would imply a sharp kink in the stress-strain relationship that probably doesn’t exist. [ETA: I tried letting sigma[i] ~ normal(E*epsilon[i], stdDevSigma) for plas_strain <= 0.0, and it actually helped, so that may have been a reasonable move after all, and it is at least physically motivated.]

FWIW, these aren’t creep curves.

Lol, rekt. Man I should really know more about this stuff than I do.

Actually, you mentioned a forward model. How is this forward model parameterized? Is it, by any chance, parameterized by the stress-strain curve itself?

If so, we can circumvent fitting these weird parameters all together (hoooray!) and just go with splines/Gaussian processes like Krzysztof was saying and feed sampled curves into the forward model.

the original recommended way to do the fitting

A lot of the original ways of doing the fitting don’t really work which is half the fun :D. I attached a (really) short presentation of one of the earlier fits I tried in Stan. It’s a creep curve with some empirical fitting equation from the 70s. Slide 1 is the equation. Slide 2 is the original fit (w/ source citation). Slide 3 is a bunch of Stan fits. Slide 4 are the pairplots. It’s hard to get much from those posteriors :D.

I don’t feel too bad poking fun at this dude. I assume in a few years (or less!) someone will be poking fun at me too.

july8_short.pdf (216.9 KB)

By the way, do you expect each curve to give you a good estimate for each curve fitting parameter and then expect the variance in the estimates to come from estimating parameters of all the different curves together? Or do you expect individual curves to only kinda-tell you about each parameter? If that makes sense…

Actually, you mentioned a forward model. How is this forward model parameterized? Is it, by any chance, parameterized by the stress-strain curve itself?

Not at all. If that were the case, I wouldn’t be bothering with this parameterization. No, it requires that one or more parameterized elastic-plastic material be used (one for each material in the simulation), and it takes the parameters of each model used as input.

By the way, do you expect each curve to give you a good estimate for each curve fitting parameter and then expect the variance in the estimates to come from estimating parameters of all the different curves together?

No, I’m fitting all the curves at once. Each curve corresponds to a temperature and a strain rate.

This is beginning to sounds more and more like a problem with identifying the generative model. Ideally, the likelihood in your Stan model will look just like your forward model. No inversions or subtractions or corrections, just the forward model from latent parameters to measured data, including the physical model and the observational process.

Because this generative approach hasn’t been well communicated for a long time, many fields have developed their own empirical heuristics (think deconvolution or filtering) that ultimately have to be abandoned before a proper model can be built. Sometimes it’s easier to start from scratch.

Of course that may not be the issue here, but more and more the questions and answers are sounding like what typically happens right before the phase transition from empirical model to generative model…

2 Likes

Not the first materials science model for Stan :-)

Listen to @betanalpha on this one. This is exactly the same problem as the pharmacologists face in trusting their physical pharmacokinetic (PK) models more than their pharmacodynamic (PD) models. When they build properly Bayesian joint models, the poorly specified PD model pollutes the PK model estimates, which are well known to within narrow tolerances. This signals that the PD model is misspecified, not that the combination is done the wrong way.

BUGS developed a cut mechanism to deal with this, but it’s not Bayesian and the implications of using it aren’t clear (Andrew was trying to figure this out a while ago—don’t know if he every got anywhere).

2 Likes