Thanks for reporting back.

See the Stan prior choice recommendation Wiki.

Not automatically. In theory it could be done given the data. Thereâ€™s an appendix in the manual aimed at users of BUGS/JAGS to help translate to Stan. And also a translation of almost all of the BUGS examples as well as BUGS examples from several books listed on the web page.

That would only be equivalent if we took a posterior mode as a point estimate (sometimes called MAP for â€śmax a posterioriâ€ť). If we wanted a point estimate, weâ€™d use the Bayesian posterior mean (or median) which has several advantages. First, it minimizes expected square error in the estimate (or absolute error with median). In symbols, if y is data and theta parameters, then the posterior mode (or MAP) estimate is
theta* = ARGMAX_theta p(theta  y)
If the prior is uniform p(theta) = const
, then p(theta  y) propto p(y  theta)
and you get the MLE. If the prior isnâ€™t constant, you can think of it as a penalty giving you a penalized maximum likelihood estimate. Or you can think of it as a Bayesian posterior mode.
On the other hand, the standard Bayesian estimator is the posterior expectation or mean,
thetahat = E[theta  y]
= INTEGRAL theta * p(theta  y) d.theta
The Bayesian estimate has the property that
thetahat = ARGMIN_phi E[(phi  theta)^2  y]
where theta
is the true parameter value (usually multivariate). This is one of those cases where Andrewâ€™s overloading of random variables and bound variables gets confusing as the theta
in the expectation is a random variable whereas the theta
in the integral is a bound variable and the theta
in the final statement is the true value of the random variable thatâ€™s also written theta
. No wonder this is so confusing for beginners! And of course, this all depends on side conditions like the expectations and integrals existing.
The posterior mode theta*
doesnâ€™t have a probabilistic interpretation, though you can talk about the sampling distribution of the estimator calculate confidence intervals. Confidence intervals are not distributions of theta
conditioned on the data as you get in the Bayesian analysis, but distributions of the estimator theta*
as a function of y
over alternative choices of y
(that lets you keep probability on the data y
). You can also talk about asymptotics and whether it converges to the true value as the data size grows.
Second, the Bayesian posterior mean exists in many situations where there is no (penalized) maximum likelihood estimate. The standard example is a hierarchical model, where the posterior density grows without bound as the posterior variance goes to zero and the lowlevel coefficients go to the posterior mean). As MacKay said in his book, â€śEM goes boom!â€ť.
Third, the question is based on the false presupposition that weâ€™d be using improper uniform priors. See the reference in the answer to (1) above.
 Iâ€™m not sure what a crossed random term is. If you mean interactions, then you can certainly do that with RStanArm.