I was reading this Yes, you can include prior information on quantities of interest, not just on parameters in your model | Statistical Modeling, Causal Inference, and Social Science and it got me to thinking, why don’t you just put a prior on the entire model formula for the mean? For example, let’s say you have count data and a Poisson model, and there is absolutely no way possible that there can be a mean count greater than 20 (i.e. it’s physically not really within the realm of the natural world to have lambda > 20). Say your model looks something like this:
for (t in 1:T) {
target += poisson_log_lpmf(y[t] | mu[t] + beta * x[t]);
}
where mu
is a vector of parameters. Now normally you put a prior on mu
and prior on beta
. That’s fine. But given the information that I gave above, why not put a prior on lambda, say something like this:
for (t in 1:T) {
target += normal_lpdf( exp(mu[t] + beta * x[t]) | 6, 5) - 1 * normal_lccdf(0 | 6, 5);
}
where the density goes from zero on the lower end to about only 1% above 17.5 or so.
I don’t really remember seeing anyone do this sort of thing. Why not? Why not do it all the time? Why not always have a prior for the formula for the mean that is within the realm of possibility? It sure is easier to think about than all the individual parameters that make up the model formula.
My motivation for the question is that I am fitting a structural time series model to some count data (a local level model). I want to do something similar to that discussed CausalImpact (google.github.io) and Combining spline and state space model - Modeling - The Stan Forums (mc-stan.org) except the key difference is that I actually have predictors of interest (components of the intervention) in my model. What I did to try to estimate the effect of the intervention was to make predictions from the full model and predictions for the model without the intervention predictors and take the difference. Something like this in generated quantities block:
vector[T] preds;
vector[T] preds_cf;
real sum_effect;
for (t in 1:T) {
preds[t] = poisson_log_rng(mu[t] + beta * x[t]); // predictions from full model
preds_cf[t] = poisson_log_rng(mu[t]); // counterfactual predictions (i.e. the model without beta*x)
intervention_effect[t] = preds_cf[t] - preds[t]; // effect of the intervention is the difference
}
sum_effect = sum(intervention_effect[t]); // full effect of the intervention by summing effect at all time points
Of course, if the intervention period is very long, then the predictions for the counterfactual blow up with values well outside the range of what is plausible in the real world. I placed a prior on these predictions for the counterfactual, and this really seemed to do the trick. Like this in the model block:
for (t in 1:T-T0) {
target += normal_lpdf( exp(mu[t-T0]) | 6, 5) - 1 * normal_lccdf(0 | 6, 5); //T0 is last timepoint before intervention; notice I leave the effect of x out of the counterfactual prediction prior
}
But then I got to thinking…am I really influencing the results here by only placing the prior on the counterfactual lambda? Maybe I need to place a prior on the full model lambda as well.
for (t in 1:T) {
target += normal_lpdf( exp(mu[t] + beta * x[t]) | 6, 5) - 1 * normal_lccdf(0 | 6, 5); // prior on lambda
for (t in 1:T-T0) {
target += normal_lpdf( exp(mu[t-T0]) | 6, 5) - 1 * normal_lccdf(0 | 6, 5); //T0 is last timepoint before intervention; notice I leave the effect of x out of the counterfactual prediction prior
}
Has anyone done this before? Thoughts?
@andrewgelman @avehtari @betanalpha