Interpretation of multiple modes

linas · April 28, 2018, 12:46pm

Hi,

I am curious how to interpret multiple modes. The stan hierarchical model is design to model inter and intra lot variation as well as analytical measurement variation, sigmaB, sigmaW, and sigmaA respectively. The intra lot variation and analytical variation are unidentifiable and therefore I impose informative prior on analytical variation which is based on analytical method validation. The problem is in pharma domain but may arise in any batch manufacturing process. Both stan and WinBUGS produce two distinct modes for intra lot variability (sigmaW), Rhat and Neff are good. I wonder where those modes come from and if it is possible to get rid of one of them.

Thanks for any advice,
Linas

The stan model
weightIG1b.stan (2.1 KB)
The trace, pairs, and densities
NAP_AMNdisso2.csvNAP_AMN_weightIG1.stan_2_2_TRUEvalidation.pdf (1.1 MB)
min(|Rhat-1|)=0.01
min(n_eff)=400 from 2000 iterations

aaronjg · April 28, 2018, 1:40pm

I would take a look at the prior choice recommendations:

It looks like one of the modes is very close to 0, and if you think that within lot SD is likely not close to 0, it may be better to pick a prior that puts less mass closer to the boundary.

linas · April 28, 2018, 5:29pm

Thanks a lot. I agree that having wide gamma puts a lot of mass at 0. Below is my attempt to move away from 0 but the mode is even sharper, see attached pdf for trace, pairs, and density. The new priors for sigmaW and sigmaB are normal(0.5, 10). I don’t want to make it too informative.

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2_TRUEvalidation.pdf (1.14 MB)

weightIG1c.stan (1.96 KB)

jonah · April 28, 2018, 5:57pm

I would also recommend putting priors directly on scale parameters rather than on precisions and then transforming to sd.

linas · April 28, 2018, 6:33pm

Now priors on sigmaW~N(0.5,10). It is not on precision anymore.

weightIG1c.stan (1.96 KB)

jonah · April 28, 2018, 6:41pm

Great. I would do it for sigmaA too. It’s definitely a better way to go in general unless you have a preternatural intuition for precisions that let’s you do smart things with them (I don’t have that).

linas · April 28, 2018, 7:02pm

Well, the problem with sigmaA is that sigmaA^2~ scaled_inv_chisq(df,s2)

where s2 is estimate of analytical variance and df - degrees of freedom, see https://en.wikipedia.org/wiki/Scaled_inverse_chi-squared_distribution section Estimation of variance when mean is unknown. It means that sigmaA^2~inv_gamma(alpha=df/2,beta=df/2*s2). I am glad to use a different prior but this is the one I found so far. Can you suggest one which has some statistical rationale?

jonah · April 28, 2018, 7:15pm

I see. I didn’t have a chance to look closely enough.

aaronjg · April 28, 2018, 9:36pm

The half-normal distribution is going to actually put more mass at 0 than the inverse-gamma you were using before. Try a gamma(2, 0.001) prior, it’s referenced in the Prior guide, and comes from this paper [1].

[1] Chung, Y., Rabe-Hesketh, S., Dorie, V., Gelman, A., & Liu, J. (2013). A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models. Psychometrika—Vol, 78(4), 685–709. A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models | Psychometrika

linas · April 28, 2018, 11:38pm

Awesome. Black is stan with gamma(2, 0.001)
on sigmaW and sigmaB , red is OpenBUGS with gamma(0.001,0.001) on corresponding precisions.

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2_TRUEvalidation.pdf (1.13 MB)

aaronjg · April 28, 2018, 11:57pm

Great! Glad that worked. If you wanted to try to make the OpenBUGS results similar, you might try setting the parameters of the gamma on the precision (i.e. inverse gamma on sigma) so that the first two moments match that of the gamma(2, 0.001).

jonah · April 29, 2018, 12:16am

@aaronjg Yes good point about the half normal! With a scale that large there’s still a ton of mass near zero even if not centered there.

In general make sure to plot the implications of your priors. These things are often easier to see visually.

betanalpha · April 30, 2018, 1:32am

For a discussion of engineering zero-avoiding priors see https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html#4_adding_an_informative_prior_for_the_length_scale.

linas · April 30, 2018, 11:23am

Thanks to all for very constructive suggestions.

linas · May 1, 2018, 12:24pm

I was too optimistic. The gamma prior is quite informative so for 3 lots it gives a variability ranging from 1 to 1000 (the units are in % dissolved). Those are quite unrealistic values since variability is in the range 1-10. My feeling is that because of lack of data (only 3 lots) prior biases the estimate. I think something like shifted vague gamma would be less informative. Is there such thing in stan?

Bonnevie · May 1, 2018, 2:58pm

Is 1-10 a hard range? Could you truncate it? Alternatively, look at @betanalpha’s link above which gives a procedure for “soft truncation”.

aaronjg · May 1, 2018, 4:41pm

What if you try it with an improper uniform prior (just remove the prior line in the model block to get this). I’m not sure that the gamma prior is putting more mass on the high values than having no prior whatsoever (ie, you may want a more informative prior, not a less informative prior).

linas · May 2, 2018, 3:41pm

Well, what I want is to get rid of the first spike of inter lot density which is too low and is meaningless. However, I don’t want to move the mean since it makes sense. I had gamma(0.001,0.001) for precision which is pretty much as uniform on sigma (maybe not?). The attached pdf shows how the estimate of inter lot density (sigmaB) progresses with increasing number of lots - see plot in 4th column in each page.

I am thinking about this note:
“Gamma(2,0) biases the estimate upward. When number of groups is small, try Gamma(2,1/A), where A is a scale parameter representing how high tau can be.”. What is the tau? Is it precision, i.e. 1/sigmaB^2? Does it mean that tauB~gamma(0,1/A), sigmaB=1/sqrt(tauB),A-max value of tauB I would expect?

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2density.pdf (84.4 KB)

aaronjg · May 2, 2018, 6:20pm

\tau is the scale parameter of the hierarchical model, i.e. \sigma_B.

The point I was trying to make is that since you have some sense of what the variance should be (“I don’t want to move the mean since it makes sense”), you shouldn’t be a afraid of using a weakly informative prior that puts more mass near these values. I believe the Chung et al. paper has some discussion of how this is derived and how to pick A.

Topic		Replies	Views
Modeling analytical error with inter/intra lot variation Modeling	9	760	April 13, 2018
Bayesian hierarchical models for manufacturing processes (aeronautics) Modeling	4	797	August 9, 2020
Convergence of multiple chains Modeling rstan , fitting-issues	5	337	March 27, 2024
Inferring very small parameters when implementing non-centered hierarchical model Modeling fitting-issues	1	40	February 4, 2025
Newly Specified Bayesian Hierarchical Model with High Runtime Modeling techniques , specification	17	688	October 12, 2023

Interpretation of multiple modes

Related topics