I already extracted the target samples through NUTS in Stan and just trying to calculate the probability of failure (probability of passing the threshold) in the standard normal space.
Honestly, I got stuck for about a week into that… I would gratefully appreciate it if you could help me out in this regard.
I’m looking for a general solution in these type of situations.
Here is my problem:
The prior is standard normal dist. X ~ N(0,1) where X is bivariate.
and the likelihood function is target += normal_lpdf(g | 0,0.5) where g(X) = 5 - (sum(X)/sqrt(2));
(g(x) is the limit state function or performance function)
and the following code is the one i wrote in Stan:
data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
vector[2] x;
}
transformed parameters {
real g = 5 - (sum(x)/sqrt(2));
}
model {
//prior
x ~ multi_normal(mu, Sigma);
//likelihood
target += normal_lpdf(g | 0,0.5);
}
generated quantities {
int I = 0;
{
real gg = 5 - (sum(x)/sqrt(2));
if (gg <= 0) I = 1;
}
}
I also attach the plot corresponding to the results of above problem. (sample size = 2000 and using 5 chains)
I’m looking for the failure probability (probability of passing the performance function which is the red line in the figure).
Is there any way i could calculate that? (The circles in the figure are standard normal distribution contours)
P.S. As you can see based on my code, I’m sampling from this posterior ~ [ target += normal_lpdf(g | 0,0.5) ] * [ x ~ multi_normal(mu, Sigma)] and i don’t have the normalizing constant.
Thank you very much for your time and consideration.
I’m not sure what you mean, but the MCMC estimate of integrate(f(x) p(x),dx) is sample_mean(f(x_i)) if the function is the indicator function for your region of interest, then the sample_mean( is x_i in my region) is just sample_mean(if(x_i in my region) then 1 else 0) which is to say, the fraction of the sample in your region
My point was that: if you see the Stan code i put the mean of the likelihood function equal to zero and used the performance function (g) as my variable in my likelihood function (which means maximum of likelihood function happened exactly on the surface of performance function/red line in the figure). and i wanted to guide the samples to my failure region by using this technique.
Based on what you are pointing out, if we do that, the value will be around 10^-3 but as you can see the contours in the figure, the probability of failure should be around 10^-7 based on the contours of standard normal distribution.
I think you’re talking about a kind of importance sampling. you’ve got a “real” probability distribution p(x) (it’s bivariate multi-normal in your case) and you want to estimate a quantity associated with the tail… Rather than sampling billions of samples you’ve multiplied p(x) by a function q(x) and sampled from that. So now you’d like to estimate
integrate(p(x) f(x) dx)
using a sample from a distribution with density p(x)q(x)
I don’t have a proof but you can maybe do this by
sample_mean(f(x_i)/q(x_i))
that is, take a weighted average with the weights 1/q(x_i)
where f(x_i) = 1 if sample is in your region, and 0 if it’s not in your region.
Aha, I finally understand what your question is. the density you’ve sampled from is p(x)q(x)/Z where Z is unknown, and you then can’t do sample_mean(f(x_i)/(q(x_i)/Z)) because you don’t know Z…
I think you’re out of luck, at least as far as I can figure out. You might be able to do something like take your sample, fit a multivariate normal to it, and then calculate Z analytically from the fit (a kind of laplace approximation). That might be good enough for your purposes?
Thank you for your comments and thoughts about my problem.
Yes, this is exactly what I’m talking about. and It is my big issue right now.
In this toy example, X is 2D (bivariate). but in my real problem, it is 100 dimension…
In your real problem, how deep into the tail are you talking? Can you just sample from the real p(x) and ask stan to give you billions of samples? If that takes less than a day or something, then I think problem solved, it will take you more than a day to do something smarter ;-)
(How deep? it will be around 10^-9)
But the goal is not using billions of samples in Stan.
I’m sure there is a smarter way to solve such a problem which i don’t know about it unfortunately.
Your big issue is that you don’t know the normalization constant. This is in part because you’ve taken the original measure p(x)dx and multiplied by a “tilt” q(x) so you have p(x)q(x) dx
If you had a normalization constant for p(x) (the 2d normal distribution) then you could choose instead of p(x)q(x) a tilt on this p(x), you could instead simply sample from a multi-normal distribution surrounding the region of failure. Just make it a big ellipse around your line. Then you have two normal distributions p(x) the real one, and let’s call it L(x) the multi-normal in the region of failure. Both of these have normalization constants, and so by sampling from L(x) and then multiplying each sample by p(x)/L(x) you can calculate the expectation you need.
On the other hand, if in the real problem p(x) is not analytically normalizable, it seems you are more out of luck.
Your next bet might be to do something like have stan sample from p(x), then fit a flexible distributional form to this large sample, like say a gaussian mixture model… and then use the gaussian mixture model in place of your real p(x) in your importance sampling method. However, in hundreds of dimensions, I doubt this will work well.
Actually i implemented the importance sampling technique as a post-processing method after capturing the failure sample through my Stan code and it is working quite well for 2D problem.
But for higher dimensions, it doesn’t work well at all, because of the variance increasing.
I didn’t try the Gaussian Mixture Model. I’m not sure whether it is working for high dimension or not…
but i can give it a try.
I would expect that the GMM technique would be acceptable if your quantity of interest was more concentrated in the core of the distribution, like you wanted to know what is the probability to be within a certain box around the median, or whatever. Your big issue with your problem is that you are looking deep into the tails, and so your inference will be very sensitive to the tails of your model.
You might for example instead of GMM choose a t mixture with variable degrees of freedom, so at least you can have a chance of adjusting your model’s tails to fit the tails.
The log of the jacobian of the transformation would be a constant, so you would not need to add anything and can ignore the warnings. While using target+= normal_lpdf(...) makes the jacobian warnings go away, it doesn’t remove the need for the jacobian.
Thank you very much for your comment on my code.
Do you mean the way i coded and subsequently the sampling plot is not correct? because of that jacobian issue, right?
If this is the case, i have to change it to the following way regarding your point: (please correct me if I’m wrong)
data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
vector[2] x;
}
transformed parameters {
real g = 5 - (sum(x)/sqrt(2));
}
model {
//prior
x ~ normal(mu[1] + mu[2], sqrt(Sigma[1,1]^2 + Sigma[2,2]^2 + 2* Sigma[1,2]));
//likelihood
g ~ normal(0,0.5);
}
generated quantities {
int I = 0;
{
real gg = 5 - (sum(x)/sqrt(2));
if (gg <= 0) I = 1;
}
}
I think the jacobian is a non issue in your previous case, but you might want to check the manual. There are some complications that come up when the inverse transform is singular. There is a section on the absolute jacobian deterimnant of the unit vector inverse transform that is relevant.
I was thinking something more like below. Though I’m not sure how helpful it is. It seems like the bivariate case could be solved analytically, so there must be some additional complications in the multivariate case?
data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
real x;
}
transformed parameters {
real g = 5 - x/sqrt(2));
}
model {
//prior
x ~ normal(mu[1] + mu[2], sqrt(Sigma[1,1] + Sigma[2,2] + 2* Sigma[1,2]));
//likelihood
g ~ normal(0,0.5);
}
generated quantities {
int I = 0;
{
real gg = 5 - x/sqrt(2));
if (gg <= 0) I = 1;
}
}
To be honest with you, I read the section of “transformation of variables” in the manual one more time, but I’m still getting confused how to handle that with the Stan syntax…
But, i think in my case, as you mention i don’t think it is an issue, because the jacobian is constant.
On the other hand, the way you coded it in Stan, i can not distinguish between x[1] and x[2] and plot this bivariate x.
However, My original question was that how can we calculate the failure probability (probability of passing the performance function (g) which is the red line in the figure) for high dimensions?
Regarding the fact that, we don’t know the normalizing constant of the posterior, Is there anyway we could do that?
If you are looking at the bivairate normal case and then summing the two normal distributions, you can just say that sum(x,y) ~ Normal(…) with the mean and variance described above - no need to sample both and then sum.
Once you determine the log absolute determinant
of the transformation, you add it to the posterior using target +=
What was wrong with Daniel’s suggestion?
What is the original standard normal space? In Stan you handle the transformations through the jacobian adjustments, which I do not think are necessary in this case.
Sorry if this is not helpful - I may be a confused from your description of what you are trying to do.