Not able to derive simulation data parameters

I generated simulation data using preset parameters which I attempt to recover using the solver, but they numbers are way off. The summary shows

             Mean     MCSE   StdDev        5%    50%    95%  N_Eff  N_Eff/s    R_hat

lp__ -1619 2.8e-01 5.5e+00 -1.6e+03 -1619 -1610 395 8.2e-02 1.0e+00
accept_stat__ 0.85 4.5e-03 1.4e-01 5.4e-01 0.89 1.0 1000 2.1e-01 1.0e+00
stepsize__ 0.44 3.1e-15 2.2e-15 4.4e-01 0.44 0.44 0.50 1.0e-04 1.0e+00
treedepth__ 3.0 6.7e-03 1.9e-01 3.0e+00 3.0 3.0 824 1.7e-01 1.0e+00
n_leapfrog__ 8.1 1.5e-01 3.9e+00 7.0e+00 7.0 15 707 1.5e-01 1.0e+00
divergent__ 0.00 0.0e+00 0.0e+00 0.0e+00 0.00 0.00 1000 2.1e-01 -nan
energy__ 1642 4.0e-01 7.3e+00 1.6e+03 1641 1654 326 6.8e-02 1.0e+00

Is there anything abnormal about this dump? like that lp___ being a large negative number?

No, the lp__ is the log density of the posterior up to an additive constant.

Thanks. On this matter, I actually got a good model running. It’s just taking a hell of a long time to run thru (it will be over two days the way its going). As I understand, there is no way around that by parallelization, at least not easily. If anyone has any less than obvious solution to efficiently train a complex model, please suggest.

What n_eff are you targeting that’s taking that long?

The only thing to do is either make the program more efficient or reparameterize so it mixes better.

I am trying to get the basic idea of reparametrization. It appears that what it does is picking some prior distribution to bound a parameter which is otherwise unbounded. However, if I already declare lower and upper bounds on my parameters, then there is really not much I can improve by means of reparameterization.

There’s a chapter in the manual that explains what’s going on. It doesn’t have to do with bounding—everything happens on the unconstrained scale with Stan’s samplers.

I read that chapter. Take the example of Beta Distribution, what it did was convert
theta[n] ~ beta(alpha, beta);
theta ~ beta(lambda * phi, lambda * (1 - phi));

where as alpha and beta only has lower bounds, phi has an added upper bound. That is the only place I can see the reparameterization improving over the original. I am not sure how much difference that really makes, but if it did help, it seems to be due to the added upper bound. However, you are saying it does not have to do with bounding, so I don’t know why should the reparameterized version performs better, or how to apply the principle to other examples.

The model also declared that phi and lambda came from beta and pareto distributions, whereas alpha and beta came from some undeclared distributions. If it is the distributions that matters, than the example is not making that point by hiding alpha and beta’s distribution, and I have no intuition why the prior distribution can make a huge difference in computational speed.

That is a reparameterization of the beta distribution in terms of mean and total count (a kind of precision term). It needs new priors, though, so it’s not a pure reparameterization in the sense that your results will be different.

The main things that reparameterization helps with is turning funnel-like posteriors into more isotropic ones.