Dealing with uncertain parameters that aren't purely fitting parameters



FYI, the “forward model” that was mentioned earlier is a simulation (or set of simulations, really) that uses parameterized elastic-plastic models to calculate some quantities of interest. The parameters come from someone’s fits to experimental stress-strain data that looks next-to-nothing like the forward model in question. Loosely speaking, you can think of the “forward model” here as a car crash simulation, while the stress-strain data comes from a mix of Instron machines and Kolsky bar apparatus used to test one of the materials in the car.


I don’t think any of us are denying the issues you bring up, just saying that doing this analysis is going to throw all the skeletons out of the closet. Goodness of fit is likely killing the computations required to fit the model and no matter how you do it that’s what needs to get fixed. Would be nice if you could just model the entire car crash but maybe (temporarily) replacing your model with a spline will help you figure out what adjustments it needs.


Exactly. This is what Andrew calls “the folk theorem” (of computational stats). There’s not really a good antidote other than making the model better or pinning down parameters and hoping the results you get are good enough.


I’d say that it’s the latter more than the former that I’m looking for.

There are already whole research programs dedicated to “making the model better.” The catch is that a proper material model would more or less be a model of the dynamics of dislocations in polycrystalline metals with various impurities, and so far, it’s taken supercomputers to even partially solve that problem. In lieu of solving this beast, various approximate material models are used that have some parameterized functional form fit to stress-strain data from controlled experiments, and pretty much all of these approximate models are iffy in some way or another.

Basically, what I hope to do is quantify that “iffiness” in a way that can be used in uncertainty quantifications of future “car crash” simulations.

ETA: I think there’s also some confusion here about “forward models”, “generative models”, “car crashes”, and whatnot. The “car crash” simulation that I mentioned involves various objects composed of multiple materials interacting with each other. While it may be a “forward model” in whatever loose sense that bbbales2 and I were using before, it is probably not a “forward model” in the sense that most of you are using it. It is not something that could, on its own, generate anything that looks like stress-strain data, so I doubt that it is a “generative model” in the sense that betanalpha was using it. The stress-strain data is used to characterize the behavior of an individual material.


Okay, right, so my hunch was indeed correct!

The ultimate problem is that statistically you cannot quantify uncertainty of a model being right or wrong. Unfortunately, probabilities live only within the scope of a model that you can build!

That said, the approximate models are so bad that the probabilities within the model can’t even be solved accurately, so we can’t even talk about the validity of the model yet! This is the folk theorem that @Bob_Carpenter mentions in action – when the model is misspecified it can be very, very hard to fit it accurately.

Further complicating the problem is that these approximate models are usually built on top of point estimates. Often point estimates look okay, but they identify only a single parameter consistent with the data and ignore all of the points that are similarly consistent. In other words, but looking at just a point estimate you ignore the pathologies in the model induced by the misspecification. People often complain about Stan not fitting, but it’s not that Stan is worse than other programs rather it’s that the model was never fitting in the first place and only Stan was able to identify that!

What do we do? Well unfortunately these issues convolve fitting with model validity and so the solution will necessarily have to intertwine the two. Imposing better priors can help regularize the poor models and make the fitting easier, as would improving the models themselves. As easy way to hack a bad mechanistic model is to give it some additional degrees of freedom with new parameters. This can often be done in a principled way by tweaking certain parts of the model (say adding a quadratic term where there was only a linear term, or adding a multiplicative scaling where something was assumed to be constant) or by adding a Gaussian process or splines to soak up residuals as @sakrejda and @bbbales2 suggested.

Once the model is fitting accurately then we can talk about how well specified it is using prediction. Comparing the posterior predictive distribution to the given data give us a qualitative sense of how well the model is doing. This qualitative information is incredibly important as it motivates how the model should be improved. It does not, however, gives us any quantification of how wrong the model is. That, unfortunately, is impossible mathematically for some subtle reasons. Where you do see numbers thrown around, cross validation scores or information criteria and the like, are not absolute quantifications of model fit but relative quantifications that allow you to rank models. Moreover, those quantifications are estimators who bias and variances are often large enough to make any resulting rankings tricky to interpret.


Except that if the stress-strain relationship could be inferred without car crashes then those simulations would not be relevant at all. I am guessing that you are trying to pull out individual behaviors from a complex measurement of their combined behaviors, i.e. the car crash, and in that case the car crash model, or some abstraction thereof, would indeed ideally go into the generative statistical model. The model has to take material properties to measurements, and if your model doesn’t capture the generative process of that measurements in some way (even abstractive or approximate) then you are in all kinds of trouble.


Except that if the stress-strain relationship could be inferred without car crashes then those simulations would not be relevant at all. [/quote]

That’s not even close to true. Experimental stress-strain data are used to characterize the behaviors of materials, and the approximations of those behaviors (in the form of parameterized fits) are used as inputs to the “car crash” simulations. The simulations use those inputs in order to predict other quantities of interest (like how a bumper crumples).

The reason the “car crash” simulations were even brought up is because I was planning to propagate uncertainties in the material parameters (obtained from fits to stress-strain data) through the simulations in order to obtain uncertainties in the quantities of interest. The “car crash” simulations are not otherwise relevant.

I am guessing that you are trying to pull out individual behaviors from a complex measurement of their combined behaviors, i.e. the car crash.[/quote]

No. Again, the “car crash” is not used to obtain information on material behavior. The stress-strain data comes from far more controlled experiments, in this case uniaxial compression tests of material specimens.


That’s not even close to true.

Eeek! Yeah, I think @betanalpha got the details of the experiment slightly wrong, but everything he’s saying is right. I’m not even sure he got the details that wrong in a bigger picture sense haha. Bad business model to bet against Betancourt. I think I owe him a quarter from another thread. Like @Bob_Carpenter said they’ve been through this same process in other fields.

I’m pretty sure how the tests here work is there are a bunch of little idealized bars of metal. These little bars are loaded up in a big machine that stretches/compresses them in some conditions. The idea is that there are (bulk) mechanical parameters of the material can be extracted from these little idealized experiments.

These bulk mechanical parameters are used to evaluate what might happen in further big fancy simulations (so in these he’s just evaluating likelihoods – no real data collection and Bayesian inference at that point).

I think it’s the little tests we want a model for right now.

Now! I think the current model ((A + B*plas_strain^n) * (1.0 + C*log(epsilon_dot[i])) * (1.0 - Tstar^m)) is probably not the way to go. So the place to start on this would be post a plot of a few stress-strain curves and label how the bits of the curve correspond to the parameters you need. I’m pretty curious myself to walk through this. Like @sakrejda said, there’s skeletons laying around in this analysis we gotta dig up.

And! Some of those skeletons could very well be attached to the fancy forward model as well. Is the forward model something you have much control of? Is it an inhouse solver? Or is this some Abaqus/Comsol black box computational thing?

Even if we say we don’t want to do Bayesian stuff on the big fancy models I feel like it’s inevitable haha. Once you have a likelihood you just end up trying to compute a posterior. Circle of life stuff.

The catch is that a proper material model would more or less be a model of the dynamics of dislocations in polycrystalline metals with various impurities

Kinda off topic! But fine grained simulations make me super nervous. I’m definitely more of a fan of smaller models that can maybe be understood that these crazy huge things that take hours and hours and hours to run (cause even if you can’t learn much from the small models – you’re not going to learn anything really from the big ones if you can’t run them a bunch).

But yeah encore encore on the @betanalpha speech. Making me want to shed a tear. Good stuff. 10/10 on this thread. I’m gonna have my materials buddies read it.

Also another side note, fitting residuals with GPs, if anyone has a link to a paper they particularly like about this, I wanna see it. I’ve been trying this on some other stuff (based on that 2001 Kennedy O’Hagan paper on model calibration).


Check also the following paper and references therein
Tuo & Wu: Prediction based on the Kennedy-O’Hagan calibration model: asymptotic consistency and other properties




FYI, you can see examples of the kind of stress-strain curves I’m dealing with here in one of the appendices of this technical report:

I don’t know if it’s quite as straightforward as mapping literal pieces of the curve to parameters, but I think I can give a general idea of how those parameters are supposed to affect the curves:

  • As mentioned before, A is supposed to be the yield strength, the stress at which the equivalent plastic strain has just started to become non-zero. However, this is a yield strength for a particular strain rate (1/s) and a particular temperature (room temperature, usually around 298 K.)

  • B and n are supposed to determine the curvature of the stress-strain curve, again for a particular strain rate (1/s) and a particular temperature (room temperature, usually around 298 K.)

  • C determines how the stress-strain curve is supposed to change with the strain rate. This is one place in particular where the model is expected to be a rough approximation.

  • Troom, Tmelt, and m determine how the stress-strain curve is supposed to change with the temperature.

  • E should be the slope of the tangent to the stress-strain curve at the origin.

I have pretty much no control at all, and for our purposes, it might as well be like an Abaqus/Comsol black box. Like Abaqus, it supports a limited range of parameterized material models. Unlike Abaqus, it cannot take a stress-strain curve as input for a material model.


@avehtari Groovy thanks!

@Bob_Carpenter Yo Bob, you have any references on the PKPD stuff? I found a Sebastian Weber (is this wds15?)
presentation on the Stan cited page (
), but maybe a generic paper with more PKPD stuff I could thumb through and look at would be nice if you know of a solid one.

@jjramsey Oh man I dig these old lab reports. I got this giant book at the lab. Something about British Steels. Soooo much data in it. Just fun to look at and think about how boring it must have been to collect haha.

How much do you care about the parameters in this model? Are they just a vector to getting these curves into the forward model?

One thing worth trying is just take the model and generate data from it and then try to fit the Stan model back to that data. There are probably a lot of identifiability issues with this model. Ignoring whether or not this model is a good fit to the real process, you can learn about some of these issues with simulated data. Run these fits and then look at the pairplots of the posterior samples and see what sorts of craziness is going on (like the stuff in that july8_short pdf I posted).

One thing that sticks out about the LANL data in that report is that there isn’t a huge amount of noise around lines. Analyzing models that amount to y ~ normal(f(...), sigma) is the easy thing, but that’s probly not what’s happening here. It’s uncertainty that hides in the dynamics (that might be a bad way to say this) that is tricky (and is probably happening here).

You got a timeframe on this analysis? Obviously it’d be nice if it just fell out in the floor but that’s probably not gonna happen haha.


My 2 cents on this topic:

It sounds like the point of using the model form given is that it corresponds to the kind of thing you can input to the downstream code. Are there other alternatives that the downstream code can handle as well? Obviously you’ve said that you can’t just give it any old random stress-strain curve but perhaps you have 2 or 3 choices as to which class of model you have available downstream? If there are alternatives, I’d suggest trying to fit all of them and see which ones work well.

Next, I think you should start with extremely informative priors. Input things like E to within 1%

 E ~ normal(E_guess,E_guess*0.01)

Do this for as many physically relevant parameters as you can. Be very tight. And give initial values that fall in the high probability region of the priors. If you still get misfit, plot the curves from the misfit, do they fit? If not, why does Stan think they do? Perhaps you have mis-specified something or have a typo.

Also, re-check all the units you’re using, is there a problem with unit conversions? Perhaps your machine is outputting pounds of force, or your cross section is mm^2 instead of m^2 etc. While you’re at it, perhaps consider casting your problem in dimensionless form, so measure stress as a fraction of the yield stress, measure strain as a fraction of the yield strain, re-work your formula to use the new formulation, and then do your fit, finally when it comes to feeding it into downstream stuff just go through and multiply by the scale factors.


Yes, @wds15 is Sebastian—he’s a Stan developer. The best intro is Charles’s—the talk is available as is the Jupyter notebook:


Interestingly enough, I had done that with an earlier version of the model, and it worked very well, returning parameters fairly close to those used to generate the data. For what it’s worth, here’s that version:

data {
   real Tmelt;
   real Troom;

   real<lower=0.0> A_guess_mean;
   real<lower=0.0> B_guess_mean;
   real<lower=0.0> n_guess_mean;
   real<lower=0.0> C_guess_mean;
   real<lower=0.0> m_guess_mean;
   real<lower=0.0> stdDevSigma_guess_mean;

   int<lower=0> dataLen;
   vector[dataLen] epsilon;
   vector[dataLen] epsilon_dot;
   vector[dataLen] T;
   vector[dataLen] sigma;

parameters {
    real<lower=0.0> A;
    real<lower=0.0> B;
    real<lower=0.0> n;
    real<lower=0.0> C;
    real<lower=0.0> m;

    real<lower=0.0> stdDevSigma;

model {
    real Tdiff = Tmelt - Troom;

    // Priors
    // Note that I'm using 100% of the mean as a guess for the standard deviation; not sure if this is a good idea.
    A ~ normal(A_guess_mean, A_guess_mean)T[0.0,];
    B ~ normal(B_guess_mean, B_guess_mean)T[0.0,];
    n ~ normal(n_guess_mean, n_guess_mean)T[0.0,];
    C ~ normal(C_guess_mean, C_guess_mean)T[0.0,];
    m ~ normal(m_guess_mean, m_guess_mean)T[0.0,];

    stdDevSigma ~ normal(stdDevSigma_guess_mean, stdDevSigma_guess_mean)T[0.0,];

    for (i in 1:dataLen) {

       real Tstar = (T[i] - Troom)/Tdiff;

       // real Tstar = T[i]/Tmelt;

       sigma[i] ~ normal((A + B*(epsilon[i])^n)*(1.0 + C*log(epsilon_dot[i]))*(1.0 - Tstar^m), stdDevSigma);

As you can see, though, there’s nothing here that treats E and Tmelt as parameters. With the current version of the model, though, which is mostly the same as the Stan code I had posted earlier, but with the changes mentioned in the “ETA” on post #16 (i.e. letting “sigma[i] ~ normal(E*epsilon[i], stdDevSigma) for plas_strain <= 0.0”), the fit is very wonky, with most values of Rhat far from 1. One catch I do see with the fit to the new “fake” data is there probably is a kink in it, since there’s a linear part with slope E followed by a nonlinear curve. It looks like this:

IIRC, Stan does not handle kinks like that all that well. FWIW, the parameters A,B,n,C, and m that I used for the older model were used to create the “fake” data for the new model.

[ETA: The kind of kink seen in the above fake data does not appear in the real data that I have.

Also, in the fake data, the points before the kink are due to linear elastic behavior (i.e. sigma = Eepsilon), while the points after are due to the plastic behavior predicted by the model (i.e. sigma = (A + Bplas_strain)*(…)).]


It handles them fine in data, if you had something like that in parameter space, particularly for from a plot of parameter[1] vs. parameter[2], you might not have good luck b/c the mass matrix for the region on the LHS of the kink would be completely bogus on the RHS of the kink.


Ah, okay. I was thinking of what I saw here:


Interestingly enough, I had done that with an earlier version of the model, and it worked very well, returning parameters fairly close to those used to generate the data.

That is interesting. I had a closer look at the model today.

It looks like this Johnson and Cook model is defined in terms of the plastic strain (epsilon - sigma / E). I wouldn’t try to estimate the Young’s modulus E (is that what it’s called?) at the same time as doing this fit. Reason being:

For the E parameter, you get the plastic strain as:
plastic_strain = epsilon - sigma / E

Then this goes into the Johnson-Cook model like:
sigma ~ normal((A + B * plastic_strain^n)..., stdDev)

The first issue is that if you’re estimating B and E, you can put this all together and get:

sigma ~ normal((A + B * (epsilon - sigma / E)^n)..., stdDev)

And since we don’t really care about the value of B, we could write that like a different parameter C and bring it inside the power:
sigma ~ normal((A + (C * epsilon - C * sigma / E)^n)..., stdDev)

So we are kindof trying to estimate C and E in the ratio C/E, so that could easily lead to some nonidentifiability. It’s my understanding that B as a parameter here is just to make the fit good and we really don’t care about its value. It really looks like all it’s doing is reversing the transformation plastic_strain = epsilon - sigma / E.

Another issue is I’m not sure what sigma is doing on both sides of the equation here :D. So I think if we need to use the Johnson-Cook model, we need to be comfortable in transforming our data to plastic strain before it goes in.

If you actually have reasonably clear elastic deformation in your data (where sigma = e E), then I think you should just chop of those bits of the curves and do a separate fit to get an estimate of the elastic constants. Do you think you have enough data to get good estimates on these Young’s Moduli for your samples?

I think the issue you were originally talking about is that you still have some uncertainty in E (from sample to sample or whatever), so you’d like to pass that through the rest of the analysis. I’m not sure with the Johnson-Cook stuff (where we infer a value of B) we can really do that (the values of B will just adjust to whatever value of E we feed in).

This is where my unfamiliarity with Stan will show, but I don’t think there’s a way to do like a random likelihood thing here? Every random variable in Stan gets inferred (there’s no way to say for each sample let E come from a normal(0, 1) not influenced by anything else – the prior can be like that but Stan explores the posterior and we don’t control that), and this is a non-identifiable model so these inferences will go south. I don’t think there’s anything we can do here other than fix E and pray or fix B and pray. Neither seem particularly delightful cause we don’t get that uncertainty we are after.

I think you can make the argument that Tm might try to cancel itself out (and not even matter). If we try to estimate the melting temperature parameter, we’re messing with a factor:
(1 - ((T - Troom) / (Tm - Troom))^m)

If m is 1 (and I think you expect it to be near 1?) then that simplifies to:
(Tm - T) / (Tm - Troom)

The derivative of this with respect to Tm is proportional to 1 / (Tm - Troom), so if Tm - Troom is large, this isn’t even really going to change the log probability of the problem much? Maybe it’s canceling itself out? It’s probably dangerous thinking to handwave the math like this. Either way I wouldn’t expect it to accomplish much, because there’s a big factor of A that you could pull out front that is just going to adjust to be whatever it needs to be to fit the curves.

The way to actually verify I’m not just making stuff up (and I could be cause I didn’t run the models) is to check out pairplots of the traceplots and see if things are really tightly correlated. I think B and E would be (and this will lead to crazy estimates of each of these parameters in the posterior). You might check A and Tm as well.

Any chance you have a simpler model as an option? Do they all work on this plastic strain thing?

Hopefully there’s not too much wrong with this wall of text. I haven’t told you anything you didn’t already know haha. No closer to having meaningful posteriors. I’ll try to play around with simulated data on this sometime in the near future so I can do more than speculate.


Yeah, I thought that might be the case.

Yeah, I was surprised that Stan even let me do that. Not sure if that’s valid or not. I do notice that the Stan reference manual says that

y ~ normal(mu,sigma);

is equivalent to

target += normal_lpdf(y | mu, sigma);

I wonder if that holds true if mu is a function of y, i.e. if

target += normal_lpdf(y | mu(y, x, param1, param2, ...), sigma);

even makes sense.

Good estimates? Not really. There’s not much data in the linear regime, really, usually only a couple of points. It’s enough to verify that E is probably near 200e3 MPa (or 200 GPa), but not really enough to derived E in the first place.

As far as I can tell, functions generating random numbers are only allowed in the “generated quantities” section of Stan, which may be just as well. I doubt that a random likelihood function would make sense.

I wonder if it might make sense to, outside of Stan, generate several plausible but random values for E, and then in Stan, treat those values as data and have a model where

for (i in 1:dataLen) {
    for (j in 1:numEVals) {
         real plas_strain = epsilon[i] - sigma[i]/E[numEVals];

         if (plas_strain > 0.0) {
              real Tstar = (T[i] - Troom)/Tdiff;

              sigma[i] ~ normal((A + B*plas_strain^n)*(1.0 + C*log(epsilon_dot[i]))*(1.0 - Tstar^m), stdDevSigma);

Somehow, that seems like a naive approach, though.


You might be breaking your calculation by using an uninitialized values. It’s fine for a variable to be on both sides (just like in any othe rprogramming language where you can do a = a + 1 to increment but if a starts out undefined, it ends up undefing after the increment (an arbitrary value, I think)


That’s unlikely to be the problem here. Data variables sigma and epsilon are both defined. Also, I don’t think “a = a + 1” is a good analogy, since “y ~ dist(params, …)” is not a statement that overwrites y in the first place.