Hi, to dig up this old thread. I’ve been working with these to do some Poisson time series modeling for some gamma ray data. In general, all works fine and I’ve got a reduce_sum version that really speeds things up with there a lot of observations:

```
real partial_log_like_bw_multi_scale(int[] counts_slice, int start, int end, vector time, vector exposu\
re, row_vector omega1, row_vector omega2, vector beta1, vector beta2, real dt, real bkg, real scale1, r\
eal scale2, real amplitude, int k) {
int N = size(counts_slice);
vector[N] time_slice = time[start:end] - dt;
vector[N] expected_counts_log;
vector[N] expected_counts;
matrix [N,k] tw1 = time_slice * omega1;
matrix [N,k] tw2 = time_slice * omega2;
expected_counts_log = ((cos( time_slice * omega1) + cos( time_slice * omega2) ) * beta1) + ((sin( time_slice * omega1) + sin( time_slice * omega2) ) * beta2);
expected_counts = exposure[start:end] .* (exp(scale * expected_counts_log) * amplitude + bkg);
return poisson_lpmf(counts_slice | expected_counts);
```

One can ignore some the specific things here for the problem I’m working on (dt is a time delay in the signal and amplitude/ background are just some values for scaling around observations). Also, the exp(…) is to make the signal positive.

This can be fed to reduce_sum for slicing over x, y (time/counts) and the performance improves. Here I am doing the non-stationary version of the kernel and instead of sampling omega, I’m fitting for it which really stabilizes things.

Basically there is a bandwidth (inverse length scale) parameter that is sampled for and it sets the distribution for omega:

```
parameters {
vector[k] beta1; // the amplitude along the cos basis
vector[k] beta2; // the amplitude along the sin basis
row_vector[k] omega_var[2];
vector[2] log_scale;
ordered[2] log_bw;
}
transformed parameters {
vector[2] scale = exp(log_scale) * inv_sqrt(k);
vector[2] bw = exp(log_bw);
row_vector[k] omega[2];
// non-center
omega[1] = omega_var[1] * bw[1];
omega[2] = omega_var[2] * bw[2];
}
model {
// priors
beta1 ~ std_normal();
beta2 ~ std_normal();
log_scale ~ std_normal();
log_bw ~ normal(0, .5);
omega_var[1] ~ std_normal();
omega_var[2] ~ std_normal();
target += reduce_sum(partial_log_like_bw_multi_scale, counts[1], grainsize[1],
time[1], exposure[1],
omega[1], omega[2], beta1, beta2,
0., bkg, scale[1], scale[2], 1., k);
}
```

You can see I’ve off-centered omega as it forms a one way normal. As I said, this is pretty fast, and works well for what I’m trying to do… but there are non-concentrated divergences everywhere and I can’t seem to figure out what is causing them. Originally it was the tension in omega and bw, but that is dealt with. s Stetting the step size via adapt-delta helps some… but there must be another issue. Perhaps the priors for the length scale are too wide, but I’ve not been able to correct for that.

Nevertheless, I wanted to share this parallel version if anyone is interested. It is possible there are further ways to speed it up.