Hi all,
I’m attempting to run HMC on this modelling problem which moves a source point around on a sphere that generates the posterior distribution as well as finds the maximum likelihood (if using optimize). In general the likelihood looks something like this:
for (i in 1:N)
likelihood += log(signal + astro + atmos)
its a sum over all the data points on the sphere, where the signal term is a von mises distribution of the source distance to each point. If the source point is close to a data point then its angular distance is small and contributes a significant portion to the likelihood. The astro and atmos terms are baked into the data points that describes how likely they are to have come from a diffuse background.
For this example there are 1300 total points on the sphere, where there is a grouping of 100 astro (so the signal term is worth 100) and there are 300 total astro points (so the astro background part is worth 300-100) this means the atmos part is (1300-300)
It currently compiles and runs, albiet slowly, so ive attempted to add parallelization to help it run faster. The issue is that it very commonly fails to converge, even after very long runs, and also diverges quite frequently. I believe this is due to the extremely peaked von mises distribution, where the sampler can get stuck in the high peaks of single data points.
Im wondering what i can do to help it run faster or at least converge properly, for this quite simple example of very few data points (later id like it to work for ~10000 total, which does run but converges even worse), as well as adding more sources (so each source has its own dec and ra parameter).
I can do a simple grid scan in c++ already to find the optimum point, and that works properly, but isnt sufficient for when there is more than one source and so markov chain monte carlo is needed to help with the high dimensionality of many sources.
functions {
real spdf(real decevent, real raevent, real decsource, real rasource) {
real vmk = 4550;
real cosangdiff = cos(decevent)*cos(decsource) + sin(decevent)*sin(decsource)*cos(raevent-rasource);
return vmk*exp(vmk*(cosangdiff-1))/(2*pi());
}
real partial_sum_lpmf(array[] int slice_idx, int start, int end, array[,] real events_slice, real dec, real ra, int N) {
real total = 0;
for (i in start:end) {
int idx = slice_idx[i - start + 1];
real signal = 100*spdf(events_slice[idx,1], events_slice[idx,2], dec, ra);
real astro = (300-100)*events_slice[idx,3];
real atmos = (N-300)*events_slice[idx,4];
total += log( signal + astro + atmos );
}
return total;
}
}
data {
int<lower=0> N;
array[N,4] real events;
}
parameters {
unit_vector[3] direction;
}
transformed parameters {
real<lower=0,upper=pi()> dec = acos(direction[3]);
real<lower=0,upper=2*pi()> ra = atan2(direction[2], direction[1]) + pi();
}
model {
int grainsize = 1;
array[N] int event_indices = linspaced_int_array(N, 1, N);
target += reduce_sum(partial_sum_lpmf, event_indices, grainsize, events, dec, ra, N);
}
any input/help about reparameterization (what the diagnose() tool suggests could help) would be appreicated. Im currently running a version of this with the vmk down at 45, which means it peaks much less and should diverge less, but id like it to work with quite high vmk if possible.
Thank you.