Interpretation of multiple modes

Hi,

I am curious how to interpret multiple modes. The stan hierarchical model is design to model inter and intra lot variation as well as analytical measurement variation, sigmaB, sigmaW, and sigmaA respectively. The intra lot variation and analytical variation are unidentifiable and therefore I impose informative prior on analytical variation which is based on analytical method validation. The problem is in pharma domain but may arise in any batch manufacturing process. Both stan and WinBUGS produce two distinct modes for intra lot variability (sigmaW), Rhat and Neff are good. I wonder where those modes come from and if it is possible to get rid of one of them.

Thanks for any advice,
Linas

The stan model
weightIG1b.stan (2.1 KB)
The trace, pairs, and densities
NAP_AMNdisso2.csvNAP_AMN_weightIG1.stan_2_2_TRUEvalidation.pdf (1.1 MB)
min(|Rhat-1|)=0.01
min(n_eff)=400 from 2000 iterations

I would take a look at the prior choice recommendations:

It looks like one of the modes is very close to 0, and if you think that within lot SD is likely not close to 0, it may be better to pick a prior that puts less mass closer to the boundary.

1 Like

Thanks a lot. I agree that having wide gamma puts a lot of mass at 0. Below is my attempt to move away from 0 but the mode is even sharper, see attached pdf for trace, pairs, and density. The new priors for sigmaW and sigmaB are normal(0.5, 10). I don’t want to make it too informative.

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2_TRUEvalidation.pdf (1.14 MB)

weightIG1c.stan (1.96 KB)

I would also recommend putting priors directly on scale parameters rather than on precisions and then transforming to sd.

Now priors on sigmaW~N(0.5,10). It is not on precision anymore.

weightIG1c.stan (1.96 KB)

Great. I would do it for sigmaA too. It’s definitely a better way to go in general unless you have a preternatural intuition for precisions that let’s you do smart things with them (I don’t have that).

Well, the problem with sigmaA is that sigmaA^2~ scaled_inv_chisq(df,s2)

where s2 is estimate of analytical variance and df - degrees of freedom, see https://en.wikipedia.org/wiki/Scaled_inverse_chi-squared_distribution section Estimation of variance when mean is unknown. It means that sigmaA^2~inv_gamma(alpha=df/2,beta=df/2*s2). I am glad to use a different prior but this is the one I found so far. Can you suggest one which has some statistical rationale?

I see. I didn’t have a chance to look closely enough.

The half-normal distribution is going to actually put more mass at 0 than the inverse-gamma you were using before. Try a gamma(2, 0.001) prior, it’s referenced in the Prior guide, and comes from this paper [1].

[1] Chung, Y., Rabe-Hesketh, S., Dorie, V., Gelman, A., & Liu, J. (2013). A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models. Psychometrika—Vol, 78(4), 685–709. A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models | Psychometrika

1 Like

Awesome. Black is stan with gamma(2, 0.001)
on sigmaW and sigmaB , red is OpenBUGS with gamma(0.001,0.001) on corresponding precisions.

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2_TRUEvalidation.pdf (1.13 MB)

Great! Glad that worked. If you wanted to try to make the OpenBUGS results similar, you might try setting the parameters of the gamma on the precision (i.e. inverse gamma on sigma) so that the first two moments match that of the gamma(2, 0.001).

@aaronjg Yes good point about the half normal! With a scale that large there’s still a ton of mass near zero even if not centered there.

In general make sure to plot the implications of your priors. These things are often easier to see visually.

For a discussion of engineering zero-avoiding priors see https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html#4_adding_an_informative_prior_for_the_length_scale.

1 Like

Thanks to all for very constructive suggestions.

I was too optimistic. The gamma prior is quite informative so for 3 lots it gives a variability ranging from 1 to 1000 (the units are in % dissolved). Those are quite unrealistic values since variability is in the range 1-10. My feeling is that because of lack of data (only 3 lots) prior biases the estimate. I think something like shifted vague gamma would be less informative. Is there such thing in stan?

Is 1-10 a hard range? Could you truncate it? Alternatively, look at @betanalpha’s link above which gives a procedure for “soft truncation”.

What if you try it with an improper uniform prior (just remove the prior line in the model block to get this). I’m not sure that the gamma prior is putting more mass on the high values than having no prior whatsoever (ie, you may want a more informative prior, not a less informative prior).

Well, what I want is to get rid of the first spike of inter lot density which is too low and is meaningless. However, I don’t want to move the mean since it makes sense. I had gamma(0.001,0.001) for precision which is pretty much as uniform on sigma (maybe not?). The attached pdf shows how the estimate of inter lot density (sigmaB) progresses with increasing number of lots - see plot in 4th column in each page.

I am thinking about this note:
“Gamma(2,0) biases the estimate upward. When number of groups is small, try Gamma(2,1/A), where A is a scale parameter representing how high tau can be.”. What is the tau? Is it precision, i.e. 1/sigmaB^2? Does it mean that tauB~gamma(0,1/A), sigmaB=1/sqrt(tauB),A-max value of tauB I would expect?

ATN_TEVdisso2.csvATN_TEV_weightIG1.stan_2_2density.pdf (84.4 KB)

\tau is the scale parameter of the hierarchical model, i.e. \sigma_B.

The point I was trying to make is that since you have some sense of what the variance should be (“I don’t want to move the mean since it makes sense”), you shouldn’t be a afraid of using a weakly informative prior that puts more mass near these values. I believe the Chung et al. paper has some discussion of how this is derived and how to pick A.