Thanks for reporting back.
See the Stan prior choice recommendation Wiki.
Not automatically. In theory it could be done given the data. There’s an appendix in the manual aimed at users of BUGS/JAGS to help translate to Stan. And also a translation of almost all of the BUGS examples as well as BUGS examples from several books listed on the web page.
That would only be equivalent if we took a posterior mode as a point estimate (sometimes called MAP for “max a posteriori”). If we wanted a point estimate, we’d use the Bayesian posterior mean (or median) which has several advantages. First, it minimizes expected square error in the estimate (or absolute error with median). In symbols, if y is data and theta parameters, then the posterior mode (or MAP) estimate is
theta* = ARGMAX_theta p(theta | y)
If the prior is uniform
p(theta) = const, then
p(theta | y) propto p(y | theta) and you get the MLE. If the prior isn’t constant, you can think of it as a penalty giving you a penalized maximum likelihood estimate. Or you can think of it as a Bayesian posterior mode.
On the other hand, the standard Bayesian estimator is the posterior expectation or mean,
theta-hat = E[theta | y]
= INTEGRAL theta * p(theta | y) d.theta
The Bayesian estimate has the property that
theta-hat = ARGMIN_phi E[(phi - theta)^2 | y]
theta is the true parameter value (usually multivariate). This is one of those cases where Andrew’s overloading of random variables and bound variables gets confusing as the
theta in the expectation is a random variable whereas the
theta in the integral is a bound variable and the
theta in the final statement is the true value of the random variable that’s also written
theta. No wonder this is so confusing for beginners! And of course, this all depends on side conditions like the expectations and integrals existing.
The posterior mode
theta* doesn’t have a probabilistic interpretation, though you can talk about the sampling distribution of the estimator calculate confidence intervals. Confidence intervals are not distributions of
theta conditioned on the data as you get in the Bayesian analysis, but distributions of the estimator
theta* as a function of
y over alternative choices of
y (that lets you keep probability on the data
y). You can also talk about asymptotics and whether it converges to the true value as the data size grows.
Second, the Bayesian posterior mean exists in many situations where there is no (penalized) maximum likelihood estimate. The standard example is a hierarchical model, where the posterior density grows without bound as the posterior variance goes to zero and the low-level coefficients go to the posterior mean). As MacKay said in his book, “EM goes boom!”.
Third, the question is based on the false presupposition that we’d be using improper uniform priors. See the reference in the answer to (1) above.
- I’m not sure what a crossed random term is. If you mean interactions, then you can certainly do that with RStanArm.