Write model for log-likelihood in Stan

Hi,

Before I have posted similar questions but I am still a little bit confused about whether there is general guideline available for writing log-likelihood for a complex model in Stan. I am only familiar with the short-handed distribution based notation like Y ~ Normal (0, 1); but not so familiar with the notation like target += -0.5 * y * y;

Also I know if we take log, the constant multiplication term in original likelihood will be separated out and thus has no effect so wecould drop it off. But I am still feel like there is some gap between my understanding and when I actually write s Stan code. Any advice on it?

Sorry for the potential replicated post!

Thx!

A quick follow up question is that is the ‘likelihood’ here only referring to the observed data likelihood ? (eg. not involving prior and thus not related to posterior)

Also I am confused about what should we do if we have multiple data-trunk of observed data, each with different corresponding parameters? For example, we observe an vector of X with Gamma distribution, Y with Bernouli distribution and Z with mixture of three normals, how should I put all these pieces together into the log-likelihood?

Thx!

Hi @stan_beginer, I’m not 100% if my answers below are what you’re looking for, so feel free to ask follow up questions and/or let me know if I misunderstood you questions:

When you see “likelihood” it’s typically referring just to the part of the model for the data, but you can also use the target += syntax for priors. At the top of the model block target starts at 0 and by the end of the model block it should be equal to the log posterior = log likelihood + log prior (potentially dropping additive constants – additive and not multiplicative because of the log scale).

Because the ~ syntax gets translated to the equivalent of the target syntax under the hood, you can mix and match in your Stan program, e.g., implementing the model for the data with the target += syntax and the priors with the ~, or vice versa.

You don’t need to do anything special if you have multiple observed outcome variables, you can just put the models for those variables all in the model block of the same Stan program. In terms of correctness, it doesn’t matter if you use the ~ syntax or the target += syntax (in some cases you may have to use the latter because we don’t have a built in distribution).

1 Like

Hi @jonah,

Thanks so much for your kindly reply!

So if I am writing a complicated model a good advice is trying to use ‘~’ for built-in function as much as possible (eg. for common prior distribution and common distribution of the data) and use ‘target +=’ for complicated models such as mixture distributuion, am I correct?

Also go back to my original question, do you have any suggestion or reference materials on them?

Thx!

That’s definitely a fine approach. Some people prefer to always use the target += syntax just so their code is consistent. That’s up to you. With target+= you can use it both with built in distributions (e.g., target += normal_lpdf(...)) or with your own code so it’s the most flexible approach.

There are a few examples of defining custom probability functions here:

And I found this blog post that I haven’t read before (so I can’t vouch for the quality) but it looks like it’s about writing your own likelihood:

If you try this out and implement your own (log) likelihood or (log) priors definitely feel free to start new topics here on the forum if you run into difficulties or have other questions.

Thanks soooo much!

1 Like

Hi Jonah,

There is a quick follow-up question: Is there any guide for how to express complex prior in Stan?

For example the data is given and we could add explicit form of the log-likelihood for the ‘target +=’ component. However, how should we deal with complex prior distribution, for example if alpha is the parameter of interest, and I would like to use a mixture prior for alpha: P(alpha|beta, lambda,mu) = lambda*beta + (1-lambda)*Normal(mu,1) where beat, lambda and mu are all hyperparameters (I add beta here for adding an extra point mass at alpha = 1 since I am highly suspect that there is a high probability that alpha could equal 1)

Thanks!

Using target+= with complicated priors works the same as for likelihoods. The idea is that if you can write down the log probability density then you can put that on the right hand side of target +=.

In the particular case of a two component mixture you can write out the log density yourself or you can use the shortcut log_mix function that is listed in this section of the functions reference:

I haven’t done a lot with mixture priors myself and I don’t know how well sampling will work with a point mass (you’ll find out!), but maybe this

can be translated to something like this:

target += log_mix(lambda, log(beta), normal_lpdf(alpha | mu, 1));

or if you wanted to write out the log density directly:

target += log(lambda * beta + (1-lambda) * exp(normal_lpdf(alpha | mu, 1)))

// or this more numerically stable version using log_sum_exp and log1m 
target += log_sum_exp(log(lambda) + log(beta), log1m(lambda) + normal_lpdf(alpha | mu, 1))

(Definitely double check my math above though, I didn’t have a ton of time so I had to write it out pretty quickly)

Thanks so much for your kindly reply!

But I am still about the mixture of a single point mass and a continuous distribution for priors, for example, what’s the density should alpha have? (I used to think that the density should just be lambda*beta + (1-lambda)*Normal(mu,1) but indeed it’s not a probabilistic density function). Since only knowing the density then we could out it into target +=. but unlike observed data Y we could model the density of Y (or likelihood) separately given whether Y equals the point or not, but for priors we did not observe it so that we could not easily write out the density function of the prior…

For the question about this particular mixture prior I recommend starting a separate topic here on the forum. I think a separate post would help get more eyes on your question and I think there are probably other people on the forum who will have more experience with mixture priors than I do so that way you’ll get a better answer!