# Using stan for generalized belief updating

In this paper, Walker et. al give a general framework for updating priors based on loss functions, rather than likelihoods (the negative log-likelihood being a special case that leads to traditional Bayesian inference). The form of the update for a loss function l(\theta, x) and prior \pi(\theta) is

\Pi(\theta | x) = \frac{\text{exp}\{-l(\theta ,x)\}\pi(\theta)}{\int\text{exp}\{-l(\theta ,x)\}\pi(\theta)d\theta}

which is “the form of a Bayesian update using exponentiated negative loss in place
of the likelihood function.” There are two specific examples of this worked out in the paper. Is it possible to use Stan to obtain these generalized posterior belief distributions when \text{exp}\{-l(\theta ,x)\} is not a likelihood? Looking forward to hearing the community’s thoughts.

Stan can sample any distribution whose log probability density is known.
It doesn’t matter whether -l(\theta, x) is a likelihood or not.
You can increment the target density directly. Looks something like this

data {
real x;
} parameters {
real theta;
} model {
theta ~ pi();
target += -l(theta, x);
}

1 Like

@nhuurre so it doesn’t matter that -l(\theta,x) is a loss function and doesn’t integrate to 1?

Yeah, it’s not a problem at all. HMC only uses the gradient of target so any normalization constants are irrelevant.

1 Like

A Stan program is required to define a density \log p(\theta) up to an additive constant that doesn’t depend on \theta. It doesn’t matter how \log p(\theta) is factored, but p(\theta) needs to have a finite integral.

If you take the first example in the linked paper, l(\theta, x) = |x - \theta|, and exponentiate its negation, \exp(-|x - \theta|), you get the kernel of the Laplace distribution with unit scale. I’m unclear on why the authors are using the term “loss function” rather than “likelihood” for this. Is it just philosophy?

@Bob_Carpenter I think this just happens to be a consequence of that particular loss function, but the goal of the paper is to show how to update prior beliefs by connecting the data to the parameters through loss functions, rather than likelihoods (which is a specific loss function). I would imagine there are several loss functions which correspond to specific negative log-likelihoods.