# How to choose a prior : family for a response with negative values?

Hi,

I’m modeling percentage change in oxygen levels in the blood from a particular experiment. So my prior before seeing the data was an inverse gaussian distribution. But my data (response variable ) has some negative values. The family( ): Inverse.gaussian doesn’t take negative values. How should I go about this?
My min and max range of values is (-23,40).With a mean of 4 and a median of 3.5. (Also this is a repeated measures data).

Just to clarify things, the distribution of the data conditional on the parameters is not the prior. What you are looking for is a sampling distribution or likelihood. Now, on to the question proper: have you tried a Gaussian (normal)? Can you show an histogram of the data?

Yes, I tried it with Gaussian(normal) and it works.
May be how I understand the family function is incorrect…So, my prior to this data (before seeing the data ) was inverse Gaussian.
@maxbiostat

Conceptually, there’s no such thing as a prior for data*. In a Bayesian analysis, we have the prior on the parameters \pi(\theta) and the likelihood f(x \mid \theta) which is the conditional distribution of the data x given the parameters \theta. The posterior distribution of \theta given x and your choice of likelihood/sampling distribution is

p(\theta \mid x) = \frac{f(x \mid \theta)\pi(\theta)}{p(x)}.

So, in your case you could say that your likelihood was an inverse-Gaussian with unknown parameters \mu and \lambda. But as you note, this likelihood is a poor choice because it gives probability zero to data that were actually observed.

*Sometimes we talk about a prior predictive distribution for x, but let’s leave that aside for the sake of clarity.

2 Likes

@maxbiostat some doubts: ( and thank you for giving your time,I’m able to now get more clarity)

1. How I understood was that, we provide prior over parameters which is usually specified as a distribution?? Like, those parameter values are taken from ‘those’ distribution…so what we specify as prior are in the end distributions??
2.I don’t understand the part that how is my likelihood inverse gaussian instead of may be a gaussian? and keeping the prior still inv.gaussian?
3.So, is it that for modelling we only need to look at the likelihood …and what happens to prior?

These are conceptual questions, that would be better addressed a textbook, like A First Course in Bayesian Statistical Methods or Statistical Rethinking.

Nevertheless, I will try to answer them here, for completeness. I invite @martinmodrak @betanalpha @andrewgelman and others to complement/correct my statements.

1. The prior is a probability distribution over the parameters, \theta, which are by definition not observed.
2. The likelihood is the distribution of the data x conditional on the parameters. Here x are observed, and are shown in your histogram. So you need to choose a distribution that is compatible with the observed data. If you have negative values, you cannot use an inverse Gaussian because that distribution does not admit negative values.
3. No, we need to look at both prior and likelihood if we want to do a Bayesian analysis, but first we need to have a clear idea of what are data and what are parameters so we can specify the model correctly.

What you need to do right now is to take a step back, state your problem clearly and then we can proceed with model building. What is the scientific question you want to answer? What exactly did you measure? How many measurements were made? How many per unit/patient?

3 Likes

Hi–strictly speaking, p(y|theta) is the distribution of the data given the parameters. The likelihood is a function of theta that is proportional to the data distribution. Different data distributions can have the same likelihood, as we discuss in chapter 6 of BDA.

3 Likes

@maxbiostat @andrewgelman Thank you…That was really well explained.

For more on the observational model see Section 1 of https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html and for more on the prior model and its relationship to the observational model see https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html.

When modeling an observational process it helps to work out what the observational space is before trying to build a model. See for example, https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#step_two:_define_observational_space. Given that your measurement here can return negative values an inverse Gaussian density is not a compatible assumption, as @maxbiostat notes.

Moreover if your measurement really returns a percent change then it should also be bounded between [-1, 1] (or [-100, 100] depending on your units for percentages) in which case a Gaussian density can’t be precisely correct. That said, it might be a fine approximation. For more see, for example, https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html#13_the_observational_model, https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html#13_the_observational_model, and https://betanalpha.github.io/assets/case_studies/probability_densities.html#11_emerging_trends.

2 Likes