How to choose a prior : family for a response with negative values?

aldc · June 10, 2020, 9:57am

Hi,

I’m modeling percentage change in oxygen levels in the blood from a particular experiment. So my prior before seeing the data was an inverse gaussian distribution. But my data (response variable ) has some negative values. The family( ): Inverse.gaussian doesn’t take negative values. How should I go about this?
My min and max range of values is (-23,40).With a mean of 4 and a median of 3.5. (Also this is a repeated measures data).

maxbiostat · June 10, 2020, 1:07pm

Just to clarify things, the distribution of the data conditional on the parameters is not the prior. What you are looking for is a sampling distribution or likelihood. Now, on to the question proper: have you tried a Gaussian (normal)? Can you show an histogram of the data?

aldc · June 10, 2020, 2:04pm

Yes, I tried it with Gaussian(normal) and it works.
May be how I understand the family function is incorrect…So, my prior to this data (before seeing the data ) was inverse Gaussian.
@maxbiostat

maxbiostat · June 10, 2020, 2:14pm

Conceptually, there’s no such thing as a prior for data*. In a Bayesian analysis, we have the prior on the parameters \pi(\theta) and the likelihood f(x \mid \theta) which is the conditional distribution of the data x given the parameters \theta. The posterior distribution of \theta given x and your choice of likelihood/sampling distribution is

p(\theta \mid x) = \frac{f(x \mid \theta)\pi(\theta)}{p(x)}.

So, in your case you could say that your likelihood was an inverse-Gaussian with unknown parameters \mu and \lambda. But as you note, this likelihood is a poor choice because it gives probability zero to data that were actually observed.

*Sometimes we talk about a prior predictive distribution for x, but let’s leave that aside for the sake of clarity.

aldc · June 10, 2020, 2:26pm

@maxbiostat some doubts: ( and thank you for giving your time,I’m able to now get more clarity)

How I understood was that, we provide prior over parameters which is usually specified as a distribution?? Like, those parameter values are taken from ‘those’ distribution…so what we specify as prior are in the end distributions??
2.I don’t understand the part that how is my likelihood inverse gaussian instead of may be a gaussian? and keeping the prior still inv.gaussian?
3.So, is it that for modelling we only need to look at the likelihood …and what happens to prior?

maxbiostat · June 10, 2020, 4:01pm

These are conceptual questions, that would be better addressed a textbook, like A First Course in Bayesian Statistical Methods or Statistical Rethinking.

Nevertheless, I will try to answer them here, for completeness. I invite @martinmodrak @betanalpha @andrewgelman and others to complement/correct my statements.

The prior is a probability distribution over the parameters, \theta, which are by definition not observed.
The likelihood is the distribution of the data x conditional on the parameters. Here x are observed, and are shown in your histogram. So you need to choose a distribution that is compatible with the observed data. If you have negative values, you cannot use an inverse Gaussian because that distribution does not admit negative values.
No, we need to look at both prior and likelihood if we want to do a Bayesian analysis, but first we need to have a clear idea of what are data and what are parameters so we can specify the model correctly.

What you need to do right now is to take a step back, state your problem clearly and then we can proceed with model building. What is the scientific question you want to answer? What exactly did you measure? How many measurements were made? How many per unit/patient?

andrewgelman · June 10, 2020, 8:11pm

Hi–strictly speaking, p(y|theta) is the distribution of the data given the parameters. The likelihood is a function of theta that is proportional to the data distribution. Different data distributions can have the same likelihood, as we discuss in chapter 6 of BDA.

aldc · June 11, 2020, 6:42am

@maxbiostat @andrewgelman Thank you…That was really well explained.

betanalpha · June 13, 2020, 8:28pm

For more on the observational model see Section 1 of https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html and for more on the prior model and its relationship to the observational model see https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html.

When modeling an observational process it helps to work out what the observational space is before trying to build a model. See for example, https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#step_two:_define_observational_space. Given that your measurement here can return negative values an inverse Gaussian density is not a compatible assumption, as @maxbiostat notes.

Moreover if your measurement really returns a percent change then it should also be bounded between [-1, 1] (or [-100, 100] depending on your units for percentages) in which case a Gaussian density can’t be precisely correct. That said, it might be a fine approximation. For more see, for example, https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html#13_the_observational_model, https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html#13_the_observational_model, and https://betanalpha.github.io/assets/case_studies/probability_densities.html#11_emerging_trends.

Topic		Replies	Views
Help specifying the appropriate priors General specification	6	506	May 17, 2022
<parameter> is 0, but must be positive! when trying to sample from prior of gaussian process Modeling	2	300	April 28, 2024
Priors in Negative Binomial Multilevel with random intercept Modeling priors , rstanarm	3	1672	August 11, 2020
Choosing a sampling distribution for left skewed data brms	15	1457	March 20, 2024
Priors for hyperparameters General	4	673	April 11, 2022

How to choose a prior : family for a response with negative values?

Related topics