Already captured the target samples through NUTS, (need some advice) How can i find the probability of failure?

Dear all,

Any help would be gratefully appreciated.

I already extracted the target samples through NUTS in Stan and just trying to calculate the probability of failure (probability of passing the threshold) in the standard normal space.

Honestly, I got stuck for about a week into that… I would gratefully appreciate it if you could help me out in this regard.
I’m looking for a general solution in these type of situations.

Here is my problem:

The prior is standard normal dist. X ~ N(0,1) where X is bivariate.
and the likelihood function is target += normal_lpdf(g | 0,0.5) where g(X) = 5 - (sum(X)/sqrt(2));
(g(x) is the limit state function or performance function)

and the following code is the one i wrote in Stan:

data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
  vector[2] x;
}

transformed parameters {
real g = 5 - (sum(x)/sqrt(2));
}

model {
//prior
    x ~ multi_normal(mu, Sigma);
//likelihood  
  target += normal_lpdf(g | 0,0.5);
}

generated quantities {
  int I = 0;
  {
    real gg = 5 - (sum(x)/sqrt(2));
    if (gg <= 0) I = 1;
  }
} 

I also attach the plot corresponding to the results of above problem. (sample size = 2000 and using 5 chains)
I’m looking for the failure probability (probability of passing the performance function which is the red line in the figure).

Is there any way i could calculate that? (The circles in the figure are standard normal distribution contours)

P.S. As you can see based on my code, I’m sampling from this posterior ~ [ target += normal_lpdf(g | 0,0.5) ] * [ x ~ multi_normal(mu, Sigma)] and i don’t have the normalizing constant.

Thank you very much for your time and consideration.

Regards,
Hamed

Just take the fraction of all samples that meet your condition as your estimate

Thank you very much for your prompt reply.

But, I cannot do that because, the fraction is not based on the original standard normal space and i don’t have the normalizing constant.

I’m not sure what you mean, but the MCMC estimate of integrate(f(x) p(x),dx) is sample_mean(f(x_i)) if the function is the indicator function for your region of interest, then the sample_mean( is x_i in my region) is just sample_mean(if(x_i in my region) then 1 else 0) which is to say, the fraction of the sample in your region

Hi Daniel,

Thank you for your comment and i appreciate it.

My point was that: if you see the Stan code i put the mean of the likelihood function equal to zero and used the performance function (g) as my variable in my likelihood function (which means maximum of likelihood function happened exactly on the surface of performance function/red line in the figure). and i wanted to guide the samples to my failure region by using this technique.

Based on what you are pointing out, if we do that, the value will be around 10^-3 but as you can see the contours in the figure, the probability of failure should be around 10^-7 based on the contours of standard normal distribution.

I think you’re talking about a kind of importance sampling. you’ve got a “real” probability distribution p(x) (it’s bivariate multi-normal in your case) and you want to estimate a quantity associated with the tail… Rather than sampling billions of samples you’ve multiplied p(x) by a function q(x) and sampled from that. So now you’d like to estimate

integrate(p(x) f(x) dx)

using a sample from a distribution with density p(x)q(x)

I don’t have a proof but you can maybe do this by

sample_mean(f(x_i)/q(x_i))

that is, take a weighted average with the weights 1/q(x_i)

where f(x_i) = 1 if sample is in your region, and 0 if it’s not in your region.

and q(x_i) is the normal_pdf(x_i,q,0,0.5)

Aha, I finally understand what your question is. the density you’ve sampled from is p(x)q(x)/Z where Z is unknown, and you then can’t do sample_mean(f(x_i)/(q(x_i)/Z)) because you don’t know Z…

I think you’re out of luck, at least as far as I can figure out. You might be able to do something like take your sample, fit a multivariate normal to it, and then calculate Z analytically from the fit (a kind of laplace approximation). That might be good enough for your purposes?

Hi Daniel,

Thank you for your comments and thoughts about my problem.
Yes, this is exactly what I’m talking about. and It is my big issue right now.
In this toy example, X is 2D (bivariate). but in my real problem, it is 100 dimension…

I truly appreciate your time.

Hamed

In your real problem, how deep into the tail are you talking? Can you just sample from the real p(x) and ask stan to give you billions of samples? If that takes less than a day or something, then I think problem solved, it will take you more than a day to do something smarter ;-)

Thank you for your message, Daniel. :)

(How deep? it will be around 10^-9)
But the goal is not using billions of samples in Stan.
I’m sure there is a smarter way to solve such a problem which i don’t know about it unfortunately.

Regards,
Hamed

First, reading up on importance sampling: https://en.wikipedia.org/wiki/Importance_sampling

Your big issue is that you don’t know the normalization constant. This is in part because you’ve taken the original measure p(x)dx and multiplied by a “tilt” q(x) so you have p(x)q(x) dx

If you had a normalization constant for p(x) (the 2d normal distribution) then you could choose instead of p(x)q(x) a tilt on this p(x), you could instead simply sample from a multi-normal distribution surrounding the region of failure. Just make it a big ellipse around your line. Then you have two normal distributions p(x) the real one, and let’s call it L(x) the multi-normal in the region of failure. Both of these have normalization constants, and so by sampling from L(x) and then multiplying each sample by p(x)/L(x) you can calculate the expectation you need.

On the other hand, if in the real problem p(x) is not analytically normalizable, it seems you are more out of luck.

Your next bet might be to do something like have stan sample from p(x), then fit a flexible distributional form to this large sample, like say a gaussian mixture model… and then use the gaussian mixture model in place of your real p(x) in your importance sampling method. However, in hundreds of dimensions, I doubt this will work well.

Highly appreciate for sharing the key points.

Actually i implemented the importance sampling technique as a post-processing method after capturing the failure sample through my Stan code and it is working quite well for 2D problem.

But for higher dimensions, it doesn’t work well at all, because of the variance increasing.

I didn’t try the Gaussian Mixture Model. I’m not sure whether it is working for high dimension or not…
but i can give it a try.

Thanks a lot.
Hamed

I would expect that the GMM technique would be acceptable if your quantity of interest was more concentrated in the core of the distribution, like you wanted to know what is the probability to be within a certain box around the median, or whatever. Your big issue with your problem is that you are looking deep into the tails, and so your inference will be very sensitive to the tails of your model.

You might for example instead of GMM choose a t mixture with variable degrees of freedom, so at least you can have a chance of adjusting your model’s tails to fit the tails.

Good luck!

1 Like

Any other help would be greatly appreciated.

Thank you very much, folks.
Hamed

Could you just have a single variable x such that

x ~ normal(mu[1] + mu[2], sqrt(sigma[1,1]^2 + sigma[2,2]^2 + 2* sigma[1,2]);

and then:

g = 5 - x/sqrt(2);

and

g ~ normal(0,0.5)

The log of the jacobian of the transformation would be a constant, so you would not need to add anything and can ignore the warnings. While using target+= normal_lpdf(...) makes the jacobian warnings go away, it doesn’t remove the need for the jacobian.

Hi Aaron,

Thank you very much for your comment on my code.
Do you mean the way i coded and subsequently the sampling plot is not correct? because of that jacobian issue, right?

If this is the case, i have to change it to the following way regarding your point: (please correct me if I’m wrong)

data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
  vector[2] x;
}

transformed parameters {
real g = 5 - (sum(x)/sqrt(2));
}

model {
//prior
    x ~ normal(mu[1] + mu[2], sqrt(Sigma[1,1]^2 + Sigma[2,2]^2 + 2* Sigma[1,2]));
//likelihood  
  g ~ normal(0,0.5);
}

generated quantities {
  int I = 0;
  {
    real gg = 5 - (sum(x)/sqrt(2));
    if (gg <= 0) I = 1;
  }
}

Thank you.
Hamed

I think the jacobian is a non issue in your previous case, but you might want to check the manual. There are some complications that come up when the inverse transform is singular. There is a section on the absolute jacobian deterimnant of the unit vector inverse transform that is relevant.

I was thinking something more like below. Though I’m not sure how helpful it is. It seems like the bivariate case could be solved analytically, so there must be some additional complications in the multivariate case?

data{
vector[2] mu;
matrix[2, 2] Sigma;
}
parameters {
  real x;
}

transformed parameters {
real g = 5 - x/sqrt(2));
}

model {
//prior
    x ~ normal(mu[1] + mu[2], sqrt(Sigma[1,1] + Sigma[2,2] + 2* Sigma[1,2]));
//likelihood  
  g ~ normal(0,0.5);
}

generated quantities {
  int I = 0;
  {
    real gg = 5 - x/sqrt(2));
    if (gg <= 0) I = 1;
  }
}

Thank you so much for your point, Aaron.

To be honest with you, I read the section of “transformation of variables” in the manual one more time, but I’m still getting confused how to handle that with the Stan syntax…
But, i think in my case, as you mention i don’t think it is an issue, because the jacobian is constant.

On the other hand, the way you coded it in Stan, i can not distinguish between x[1] and x[2] and plot this bivariate x.

However, My original question was that how can we calculate the failure probability (probability of passing the performance function (g) which is the red line in the figure) for high dimensions?

Regarding the fact that, we don’t know the normalizing constant of the posterior, Is there anyway we could do that?

I highly appreciate your time, Aaron.

Thanks a lot,
Hamed

If you are looking at the bivairate normal case and then summing the two normal distributions, you can just say that sum(x,y) ~ Normal(…) with the mean and variance described above - no need to sample both and then sum.

Once you determine the log absolute determinant
of the transformation, you add it to the posterior using target +=

What was wrong with Daniel’s suggestion?

What is the original standard normal space? In Stan you handle the transformations through the jacobian adjustments, which I do not think are necessary in this case.

Sorry if this is not helpful - I may be a confused from your description of what you are trying to do.

Your technique is actually pretty good and i like it. But based on that, i cannot plot the components of X like what you are seeing in my figure.

Yes, this is what i was thinking about and i’ll give it a try.

This is my problem actually:

To solve it, Importance sampling works but just for low dimensions. (2D)

The space is bivariate standard normal which is my prior on X.

Thank you very much, Aaron. I highly appreciate your consideration.
Hamed