Random fatigue limit model

Over the past six years, my coauthors and I have used Stan/rstan for dozens of applications and many of these applications have involved hierarchical Bayesian (HB) models. We now appreciate the many advantages of the HB approach relative to our previous use (starting in the 1990s) of lme and nlme. The development of Stan models has generally gone smoothly. For the past few weeks, however, we have been struggling with a particular application. We have been trying to reproduce the HB results that Val Johnson and others got in their 1999 discussion of our paper on the random fatigue limit (RFL) model. Our model/data runs fine (no warnings) but gives answers that do not agree with the HB analysis in the Johnson_etal1999Technometrics.pdf discussion and maximum likelihood results in the PascualMeeker1999Technometrics.pdf paper (which do agree reasonably well).

One thing that is different in the RFL model (compared with our other HB applications) is that the RFL has a lognormal distribution (our other applications always used a normal (sometimes multivatiate) distribution for the random parameter(s)). We suspect that the distribution of the random RFL parameter (gamma) is not being specified correctly because of the nonlinear (antilog) transformation.

We use
gamma = exp(mu_log_gamma + Z_log_gamma[n]sigma_log_gamma);
where Z_log_gamma[n] is (for each observation in the data set) NOR(0,1) and this followed by
mu = beta0 + beta1
log(stress[n] - gamma);
which is the location parameter of the log-location-scale failure-time distribution. That is the RFL model in a nutshell. The Stan model carefully does the right thing (we believe) when gamma>stress[n].

The estimates of gamma and the distribution of gamma should be centered around 220 [somewhat below the smallest level of stress with failures (270 MPa)]. But, using the median of the marginal posterior) exp(mu_log_gamma) = exp(1.702) = 5.48. We tried specifying tight prior distributions around the correct results (from Johnson et al.) but the joint posterior just goes to the same wrong answers (it just takes longer to get there).

We experimented with lots of alternatives like specifying the lognormal distribution directly and then we thought there might be a subtle need for a Jacobian correction, but nothing we have tried worked (at least in the way we tried to implement).

I put example codes in

To reproduce our results, all one has to do is clone the repo, open an instance of R in the folder, and execute all of the commands in the file RunRFL.R. Copies of the RFL paper and discussion are also in the repo.

Any help or other advice would be welcome.

Bill Meeker

I think that’s one log() too many. The model is gamma = exp(mu_log_gamma) so to get gamma = 220 you need mu_log_gamma = 5.4 instead of exp(mu_log_gamma) = 5.4.
Johnson et al use flat priors for beta0 and beta1 and prior mean of 5.52 for mu_log_gamma

prior_codes = c(0,0,1,1,1),
mu_log_gamma_prior_parameters = c(5.52, 0.2, 0),

and inverse-gamma priors for squared sigmas

//log_sigma_error ~ normal(log_sigma_error_prior_parameters[1],  log_sigma_error_prior_parameters[2]);
(sigma_error)^2 ~ inv_gamma(1, 0.405);
target += 2*log_sigma_error; // Jacobian

//log_sigma_log_gamma ~ normal(log_sigma_log_gamma_prior_parameters[1],  log_sigma_log_gamma_prior_parameters[2]);
(sigma_log_gamma)^2 ~ inv_gamma(3, 0.0001);
target += 2*log_sigma_log_gamma; // Jacobian

With these changes the results overlap substantially with the 95% intervals from Johnson et al table 1.

Inference for Stan model: RFL.
4 chains, each with iter=10000; warmup=2000; thin=5; 
post-warmup draws per chain=1600, total post-warmup draws=6400.

                  mean se_mean    sd   2.5%    25%    50%    75%  97.5% n_eff
beta0           31.750   0.062 4.538 24.589 28.458 31.249 34.427 42.027  5303
beta1           -5.349   0.011 0.799 -7.141 -5.833 -5.268 -4.770 -4.058  5328
sigma_error      0.483   0.001 0.043  0.389  0.458  0.484  0.509  0.563  3281
mu_log_gamma     5.341   0.001 0.072  5.168  5.300  5.351  5.393  5.448  5209
sigma_log_gamma  0.008   0.000 0.004  0.004  0.006  0.007  0.009  0.021  2532


Thanks so much for pointing out my error. My face is a bit red! I had indeed done something like an extra log in specifying my “tight priors” around the results of Johnson et al. I am now back to making progress toward my goal.

My goal is to develop a tool that engineers can use to fit models like the RFL (there are also some alternatives we are working on). The analysis of fatigue data is (and has been for many decades) a huge area of application, but engineers do not yet have all of the right statistical tools to do it right.

With respect to the RFL model, my plan is to replace the informative/weakly informative inv_gamma on sigma^2 with an informative/weakly informative lognormal priors where the engineers could specify quantiles of sigma_error and sigma_log_gamma instead of the complicated things described in Johnson et al. That approach has worked well for me for other models for reliability data.

Does that sound reasonable?