Trying to do hierarchical nonlinear model in Stan; it's slow and probably wrong

Yeah, those are the plots, and yup, aren’t any ridges there. Is this with or without the multivariate term?

I’m still using the multivariate formulation, but the pair plots are for the A, B, and otherParams.

Hmm, maybe save: diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn; in a temporary and do pairplots of ABn_mean vs diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn; (or since it takes a long time to run, just manually compute that stuff and do the pairplots that way)

Looks like I’m kind of getting somewhere in terms of diagnosis. For the first stress-strain curve, there don’t appear to be extremely blatant ridges:
Pairs-mean-vs-var-for-curve1
However, for the last two stress-strain curves, there definitely is:
Pairs-mean-vs-var-for-curve8

Here, “ABn_var” is just diag_pre_multiply(SD_ABn, LKJ_chol_ABn)*unit_ABn. Looks like there are ridges for pairs (ABn_mean[1,8], ABn_var[1,8]), (ABn_mean[2,8], ABn_var[2,8]), and (ABn_mean[3,8], ABn_var[3,8]). I’m not too clear how I’d get from that information to a fix, though.

Nice! Can we just get rid of the multivariate normal there? That ridge would make stuff difficult to sample.

I don’t really have a good interpretation for that term. I think you’d put something like that there if you felt like you were missing physics, in some way and wanted to infer these missing effects. Just make ABn_indiv = ABn_mean and see what happens (if the parameter inferences change much).

But the difference in behavior between the 1st, 2nd, and 3rd curves is weird. Any other indications of differences in the outputs? It might not be obvious. I guess the first place I’d look are posterior predictives on sigmaPlastic.

I’m not sure it is all that weird. The last two curves are for quasi-static strain rates, while the first is not. Also the last two curves are accordingly far less noisy, i.e. have much lower SD_curve values, than the first.

If ABn_indiv = ABn_mean, then the model fitting is much quicker and does not appear to have any numerical difficulties. But then, it also isn’t multilevel, either. I’d like to have some way of accounting for the cases where the elements of ABn_indiv[3,:] aren’t quite the same, or a way of quantifying how ABn_indiv[1,:] and ABn_indiv[2,:] fail to follow the relationship indicated by notRealFunction(...). Also, if the components of ABn_indiv are possibly correlated, how do I account for that without running into the identifiability problems that I had before?

I’d like to have some way of accounting for the cases where the elements of ABn_indiv[3,:] aren’t quite the same, or a way of quantifying how ABn_indiv[1,:] and ABn_indiv[2,:] fail to follow the relationship indicated by notRealFunction(…).

This seems doable. Lemme make sure I’m on the right page first.

It sounds like the model works well for the second two curves (cause they’re quasi-static strains), but not so well for the first curve (which is non-quasi-static)? Does this make sense from what you know about the model notRealFunction(...)?

So it’s true that you could do this fit:

sigmaPlastic[i] ~ normal(A_indiv + B_indiv*(epsilonPlastic[i])^n_indiv, SD_curve[curveInd]);

for all the curves individually and you’d get estimates for A, B, and n?

But when you add the piece of the model that predicts A and B as a parametric function of each curve’s strain rate and temperature, things don’t work out?

And so you want to see sort of the residual of that fitting? Like you want to know how much this parametric function is wrong?

This all sound right?

How many curves will this be for? Are you using this model in a situation where you only have a handful of curves? Or will there be a bunch of curves? And will there be curves repeated at the same strain rate/temperature combos? Or will stuff be at different temperatures?

Not necessarily, though getting the model to work well for both low and high strain rates is tricky.

Pretty much, yes. There have been at least a couple versions of notRealFunction() posited so far (and not by me), and it’s pretty clear that they have their problems. I want to quantify how bad that can be.

That depends on what you mean by “handful” and “bunch”. Somewhere around five to ten curves or so is typical. Usually, curves are not repeated for the same combination of strain rate and temperature, and that’s certainly been the case so far with the curves that I’ve been dealing with.

Not necessarily, though getting the model to work well for both low and high strain rates is tricky.

So if this were just a regression:

y ~ normal(a * x0 + b) with y and x0 known data, a and b parameters

But you knew some ys were different than other ys, you could just add that information to the regression as a (0/1) label x1, and expand the regression to:

y ~ normal(a * x0 + b * x1 + c)

I think you can do the same here, where you can add a coefficient for more or less missing mechanics.

This’ll probably be easier to interpret if you have a model that you know works for some bit of your data, and then there are some other pieces of the data where you’d need an extended model or whatnot. (instead of having two sets of data, neither of which is quite right, and trying to balance things)

For the data we’ve been talking about, you could probably just add an indicator variable for the first curve, and some extra coefficients that add on A/B/n from what the model would predict (similarly to how the regression works). Run that fit and see if the results make sense.

Check out the Kennedy and O’Hagan paper “Bayesian Calibration of Computer Models.” The stuff in there isn’t some magic bullet for model mispecification, but they talk about this problem in a pretty indepth way (and include a fancy Gaussian process solution to it). Since in your case you actually are talking about how far the behavior is away from notRealFunction, and you don’t have too many curves, then the KO GP stuff could probably work for you.

Our lab is moving, so I might be in and out the next couple days.

I’ve actually tried that, or at least tried what seems to be an implementation of it according to the paper “Selection of model discrepancy priors in Bayesian calibration”, where, for a zero-mean model discrepancy function, the Kennedy-O’Hagan approach is not much more than replacing y ~ normal(myFunc(theta, x), sd) with y ~ multi_normal(myFunc(theta, x), Sigma + (sd^2)*I). It’s slower (for obvious reasons) and I found that it if I tried to supply vague priors, the posteriors ended up not looking much different from the priors. That said, I didn’t use it in any sort of multilevel context (however that might work).

I found that it if I tried to supply vague priors, the posteriors ended up not looking much different from the priors. That said, I didn’t use it in any sort of multilevel context (however that might work).

Yeah, sounds like there’s enough little variations in the experiments you’re doing that you’ll have to stay on your toes here. Probly won’t be a single-model solution to everything.