HMC sampling from a certain domain by reducing likelihood sigma gradually


#1

Dear all,

I got stock somewhere in my work for about two weeks more or less and I could not find a reasonable answer for my problem. I would gratefully appreciate it if you could help me out in this regard.

I am using HMC method for sampling from a certain domain in parameter space (let’s assume 2D parameter space for simplicity). I want to have more sample from that domain and then using importance sampling to obtain the interested statistical inference. To this end, I am considering the bayesian approach for sampling. I am working in U space (bivariate standard normal prior) and have my likelihood model.

I found that by reducing the sigma of my likelihood model gradually, I can have more sample points on my interested region in parameter space and it helps me a lot!. But at the end after doing importance sampling, my interested inference does not converge to the correct value. If I use the constant sigma for the likelihood, it eventually converge to the correct inference.

I am wondering if there is any remedy for the first case in terms of correction factor or so on.

Any help would be greatly appreciated.
Thank you!
HN


#2

It’s not clear what you’re trying to do. If you change the model as you go, you’re not going to have a stationary distribution over a model.

Why do you want to do importance sampling? Is there a reason you can’t just fit the model with HMC directly?


#3

Thank you very much for your reply. To be more specific, Here is the problem:
Let’s say I have a bivariate space with standard normal prior for X1 and X2 and here is my likelihood model.
g(x1, x2) = 0.1*(x1 - x2).^2 - ((1/sqrt(2))*(x1 + x2)) + 2.5;
Likelihoodmodel = NormalDist(g(x1, x2),mean=0, sigma)

So my prior will be the bivariate standard normal and my likelihood will be the above 1-D model and I apply the HMC for sampling from this model.

I’m using importance sampling since I’m shifting the mean of my distribution from (X1=0, X2=0) to the center of g(x1, x2) to capture more samples from that region.
Finally, My inference will be P(g(x1, x2)<0) which means number of samples passing the limit state function (g).

If I use constant sigma for likelihood model, it takes more model call (g(x1, x2)) to converge to the exact value for P(g(x1, x2)<0). During sampling if we use the sequence of sigma (using exponential decay starting from sigma=1 to sigma=0.2), it is more efficient and use less model call but has some biasness and does not converge to exact value of P(g(x1, x2)<0). I am wondering if there is any remedy for that. It will help me a lot! If it works.

I’m showing the problem through the plot as well, the nonlinear black line will be the limit state function, g(x1, x2) and the green circle showing the initial point for the HMC sampling (X1=0, X2=0). I connected the few starting points with line to track the sample points.

Thank you very much.
H.N


#4

I don’t understand why you’re using importance sampling. Why not just code the density you care about in Stan and sample directly?

What variables are known here? Presumably x1 and x2 are vectors and sigma is a scalar. Which of these three are observed and which do you want to sample?

If the x1 and x2 are not observed, you need to account for the Jacobian of g(x1,x2) and then worry about the fact that it reduces dimensionality and thus won’t identify x1 and x2. So you need to do something about that either in terms of reparameterizing or including additional priors for soft identification.


#5

The importance sampling method takes basis in the utilization of prior information about the domain contribution to the probability integral.
Let us first assume that we know which point in the sample space X contributes the most to the failure probability. Then by centering the simulations on this point, the important point, we would obtain a higher success rate in the simulations and the variance of the estimated failure probability would be reduced.
The following figure is just for clarification of above comment relevant to the importance sampling.
I am not sure how to code it directly (without importance sampling) into the Stan.

x1 and x2 are vectors and also sigma is known and scalar. I want to sample from x1 and x2 and both are observed variables.

I think Jacobin doesn’t cause any problem here but I’m not sure about the reduced dimensionality…
but it identify the X1 and X2 since i tried it on different example based on sampling from the following posterior: NormalDist(g(x1, x2),mean=0, sigma) x bivariateNormal(mean=[0 0], Variance=[1 0;0 1])


#6

Sorry about that. I wasn’t asking about importance sampling, I was asking why you’d want to use it with Stan. Stan’s designed to just let you write down a differentiable log density and it will sample efficiently.

You need to include the log Jacobian determinant in the log density because it depends on parameters. Otherwise you won’t get the right answer.