Recently I am trying to fit some robust priors for certain parameters in my Bayesian model. For example, currently, I am considering the t-distribution and the skewed normal distribution. However, according to the Stan Functions Reference, 15.4 Skew Normal Distribution | Stan Functions Reference and 15.5 Student-T Distribution | Stan Functions Reference, both of them have three hyperparameters and I am wondering if I don’t have any information regarding those three hyperparameters, how should I specify the priors for them (such as xi, omega, alpha in skewed normal and nu, mu, sigma in t-distribution)? Should I just define them in the parameter block and then don’t specify any priors for them?
I’d examine this statement more closely as to whether you truly do not have any information. For example, does it track with your domain expertise that your parameters could shoot off to infinity? Oftentimes in applied cases, parameters are constrained in some way, even if very weakly. It will come down to the specifics of your problem and modeling choice.
One way to investigate this is to simulate data directly from your prior – when doing so, ask yourself if it produces data that seems to be within the realm of the reality. If not, adjust your priors accordingly.
For example, suppose we are modeling the concentration of chemical compounds and we know it should be a non-negative unknown parameter. But we don’t have other information about it, so my modeling strategy is to use the flexible skewed normal distribution as prior distribution for it and truncated it for the positive range. May I ask that in this case how should I specify the three hyperparameters for the skewed normal distribution?
Okay so we have some prior information here. Your next step should be to try out different values of the parameters for your prior distribution and see what kinds of concentrations this yields. If the range of concentrations determined by your prior distribution seem reasonable, you can go with those parameters, otherwise adjust them until you feel it better represents reality.
While not strictly a problem, I question the use of a truncated skew-normal distribution for a prior model when you seem hesitant to construct more informative priors. What benefits are you hoping to see that the skew-normal can model? Why not use a prior distribution that has strictly positive support like a log-normal or gamma or half normal distribution? Can you make any statements about the range of possible concentrations? That will surely inform your prior model. For example, surely there reaches a point where a certain concentration of a chemical compound is entirely unphysical – you could use that as part of your prior model.
In any case, it’s hard to give really clear and hard answers because it depends so much on the particular problem. A perfectly reasonable approach is to put some soft constraints on the parameters and construct a prior that matches that. For example, suppose you have good reason to believe that the concentration rarely exceeds a value of 10 – in that case, you could determine your hyper parameters by finding values that place 95% of the mass below 10.
Exactly. This is what we call a “prior predictive check”. It’s like a posterior predictive check without data.
I’m guessing you do have information here. This is the exact setup that motivated Jonah et al.'s prior predictive check paper (there’s a paywalled version at JRSS if you prefer). They were modeling particulate concentrations and realized their vague priors allowed that concentration to be the equivalent density of a neutron star.
It can help immensely with computation and statistical stability to use weakly informative priors that determine the rough scale of your answer.