Priors: imposed form outside calculation or calculated from data within the model

Hello Community,

I have been thinking about this for a while.

I have a multiple linear model, and beside coefficients (simplex[n]) I can predict the other x[n] as well

y = s[1] * x[1] ... s[n] * x[n]

Since I have some reference data of pure components from training set

Now the question is, is it better:

  1. To calculate the shape (normal) of each x externally and then impose mean and as into the model, or

  2. Import the data into the model and do

    data ~ normal(mu, sigma);
    x[n] ~ normal(mu, sigma);

In this case I guess I also have to pay attention on how much data I import, if the training is much bigger than the test those priors will be forced and not moved anyway, and the other way around should be true as well.

Thanks a lot.

I think the tldr; answer is do #2. If you can, let one Stan model infer everything.

In this case I guess I also have to pay attention on how much data I import, if the training is much bigger than the test those priors will be forced and not moved anyway, and the other way around should be true as well.

What’s wrong with the data informing the inferences?

If you’re talking about training/test splits, I think the way you’d do this in Stan is write a model that fits using your training set and then add a generated quantities block that evaluates your test set at the same time. Relevant post from the Gelman blog today: Breaking the dataset into little pieces and putting it back together again | Statistical Modeling, Causal Inference, and Social Science

Hope that helps

Thanks,

Actually training/test is not what I probably mean. I just have some information on x_hat[n] and instead of imposing it as data, I let the data give a prior to the x[n] for inferring the real value from the mixes I’m doing regresison on.

My doubt that in

If I have 10000 points in data, and 10 points in y

y = s[1] * x[1] ... s[n] * x[n]

The “evidence” from the data will never be enough to correct in case the biased data I provided.

Is this fair to say?

Yeah, if you have 10000 data points informing something in a simple model you’re probably gonna get a really tight posterior on it.

If you take that posterior as a prior in something else, the next posterior probably shouldn’t move around much if the models are consistent with each other. I think that’s desirable behavior.

If your 10 data points did move the inference from the 10000 data points around, I think you’d have to be suspicious you have a misspecified model somewhere.

Yes but that is what I want, in case the given x_hat don’t agrees with the mixes. See, the data x_hat is taken from gene expression from model organisms and it can easily be that among those 500 genes some might be pretty different from model organisms to real human tissue.

That’s why instead of imposing them as x[n] I want to just use them to inform x[n]

If this makes sense then the question is, what is the right ratio of “belief” vs. “inference” is there some logic in decision making, maybe in subsampling x_hat to get to the same dimensionallity?

Oh okay, so you have a model system where you’ll have a ton of data and can make inferences about your parameters, but you only kinda believe those parameters because the real system you’re working on is different.

I’m not really sure what I’d do here. Maybe someone else can comment on this. It sure sounds like you’ll have to take a leap of faith at some point (which probably isn’t ideal if that leap of faith is in high dimensional space :D).

I doubt there’s a modeling technique to handle this. I’m not sure there’s any cure for model misspecification other than getting a good model. I think this Gelman post is relevant: http://andrewgelman.com/2016/02/20/dont-get-me-started-on-cut/ .

That’s right

Sorry I don’t want to keep you here forever, just to make sure: when you talk about model misspecification what do you mean? I believe my model (prior shape, conditionalities etc…)

What I don’t fully believe is the assumption on particular values of the data I provide. I though this was pretty common in Bayes, the fact that you can input information but that information can be “negated” IF there is enough information within the data you are inferring.

I’m a bit confused by noticing I am trying to do something “not standard” when I though this was a big point of probabilistic modeling. We have belief and those belief can be updated if there is new evidence (in my case the mixes that hide such evidence convolved with other evidence).

And how do I provide that prior information was the point, in a way its information content is fairly balanced with the mixed data I use for inference

Sorry I don’t want to keep you here forever

Hahaha, I’m just bored waiting for a model to run.

What I don’t fully believe is the assumption on particular values of the data I provide. I though this was pretty common in Bayes, the fact that you can input information but that information can be “negated” IF there is enough information within the data you are inferring.

This seems fine.

And how do I provide that prior information was the point, in a way its information content is fairly balanced with the mixed data I use for inference

This is what I don’t agree with. You don’t get to choose how much information moves around. Bayes rule does that under the modeling assumptions you make :D. If you limit that flow of information somehow, presumably you’re doing it because you know about some outside disagreement between the two models. And I’m using the term model misspecification for this (I think it’s right).

Model misspecification is a thing @betanalpha brings up a lot around here.

I think it boils down to, if you generate data with one model and then try to fit it with another, then your posteriors (even if your MCMC chains are super healthy and returning lots of nice looking samples) can be really wrong and misleading.

It came up in this thread: Dealing with uncertain parameters that aren't purely fitting parameters . For the tldr; search for the Bob and Betancourt posts first.

2 Likes

If you think that the training data is sort of related to the current measurements then model that! In particular, this is exactly what hierarchical models are meant to do. So instead of writing

data ~ normal(mu, sigma);
x[n] ~ normal(mu, sigma);

do something like

data ~ normal(alpha[1], sigma);
x[n] ~ normal(alpha[2], sigma);
alpha ~ normal(mu, tau);
mu ~ ...
tau ~ ...

You can make the hyperpriors over mu and tau strongly informative to control the amount of pooling between the two data sets.

Of course you could model the partial similarities between the two data sets in other ways, but a straight up hierarchical model will probably be your best bet.

1 Like

That’s interesting @betanalpha ,

I haven’t though about this. Few questions/notes:

  1. The statements would be (?)

    x_hat[n] ~ normal(alpha[1,n], sigma[n]);
    x[n] ~ normal(alpha[2,n], sigma[n]);
    alpha ~ normal(mu, tau);
    mu ~ …
    tau ~ …

  2. alpha dimension would be [n,2] ?

  3. sigma would be [n]

  4. nu, tau dimensions would be [n] ?

  5. mu should be ideally a value between two alphas (e.g., 3, 4; 3, 5; etc…)

So I don’t understand how can I control the pooling with one statement mu ~ and tau ~

to control all x/x_hat relationships with a unique statement I should do

alpha[1,n] - alpha[2,n] / (normalization component) ~ normal(0, 1); (does this make sense?)

Thanks