Does Stan use maximum likelihood estimation to generate the likelihood function?



I’ve read that Bayesian parameter estimation generally uses maximum likelihood estimation to generate the likelihood function. I’m wondering if Stan does this as well?


Neither of those are true. The likelihood function is coded using the distributions in Stan (or your own function). You can do Bayesian estimation via MCMC or maximum likelihood.


Thanks very much for the reply.

I got the idea that the likelihood function is usually created using maximum likelihood estimation from here:

For this depiction let us consider a standard regression coefficient b. Here we have a prior belief about b expressed as a probability distribution. As a preliminary example we will assume perhaps that the distribution is normal, and is centered on some value μb and with some variance σ2b. The likelihood here is the exact same one used in classical statistics- if y is our variable of interest, then the likelihood is p(y|b) as in the standard regression approach using maximum likelihood estimation.

I’d appreciate it if you could tell me where my misunderstanding comes from?

In addition would you be able to provide a bit more information on how Stan creates the likelihood function (assuming one doesn’t enter their own function)?

Thanks so much!


Users always enter their own likelihood function, although there are dozens to choose from in the Stan language.

What this quote is saying is that in his example, he is using the normal likelihood function for a Bayesian analysis as would be used for maximum likelihood in a frequentist analysis. But it is not correct to say that maximum likelihood estimation creates or generates a likelihood function to be used in a Bayesian analysis; the likelihood function exists before doing either.


Here’s an intuition: Sometimes we talk about MAP estimates: These are the Maximum A Posteriori estimates. It’s a bad name but it refers to have parameter values with the highest posterior probability.

The posterior is proportional to prior * likelihood. If the prior is completely uninformative, then posterior is just proportional to likelihood. In this case, the maximum a posteriori estimates are the values with the maximum likelihood. (Apologies if you know all this already.)

So both methods use the likelihood but they do different things with it. Neither of them owns or creates the likelihood function.