Probability distribution functio

Hi,

I am curious how to calculate probability from stan parameter estimate. In one dimensional it is easy - which(theta>=0.19&theta<0.21) /length(theta) gives probability that theta is within some interval. Any advice how to do it effectively when theta has multiple dimensions (around 40). My ultimate goal is to calculate probability that f(theta) is within hyper-rectangle.

Thanks for the advice.

Linas

I think you do it the same way in multiple dimensions. Count the number of samples in the box and divide by the total number of samples.

I think you’d want to try to do something about figuring out convergence. Probably compute an ESS or something. Check out https://avehtari.github.io/rhat_ess/rhat_ess.html. That shows how the ESS thing can be used to figure out how good your estimates are for quantiles.

I guess this is gonna be largely a function of how large/small your box is.

Right – the probability of a set is the expectation value of the corresponding binary indicator function. In other words you can code up the binary indicator function in generated quantities (1 if in the rectangle, 0 if not) and let Stan handle the computation of the mean and MCMC standard error estimates.

Moreover in when the boundaries are known you can just trust the stansummary/print estimates for the MCMC standard error out of the box. The quantile estimators that @bbbales2 mentions are needed when the box boundaries aren’t known exactly, as in the case of quantiles.

Thanks very much for the insight. I “forgot” to mention that eventually this probability will be used in a stochastic optimization setting. Thus the ultimate ultimate goal is to estimate integral(p(theta)f(x,theta)dtheta) where p(theta) is probability of theta being in the box defined by dtheta with center in theta - you already explained how to calculate it, f(x,theta) is some function of input x and theta.

Thanks a lot for any comments.