Hi,
Upfront warning: I am definitely still learning about stan and MCMC generally! My apologies in advance if I have missed something obvious in here.
I am trying to fit a long memory Gegenbauer ARMA time series model with 3 parameters to some synthetic data which has parameter values I already know. If you are not familiar with Gegenbauer ARMA (GARMA) models, they can be defined :
\phi(B)(1-2uB+B^{2})^{d}Y_{t}=\theta(B)\epsilon_{t}
In my case \theta(B)\equiv 1 and there is only 1 autoregressive parameter \phi. We define the Gegenbauer frequency f using the relation u=\cos(2\pi f).
Long memory models generally have their own wikipedia page at:
âŚbut as yet there is no wikipedia page covering Gegenbauer models. (Probably I will need to correct that one day soon!)
I have built up my stan program slowly - starting with fractionally differenced white noise process, graduating from there to a ARFIMA(1,1,0) process (ARIMA process with a fractional differencing parameter, and 1 autoregressive parameter) - both of which worked very well and gave good answers to the parameters I already knew.
For the Gegenbauer models, I have checked that the parameters can be estimated with reasonable accuracy via a number of other frequentist based methods, including Whittle estimation. So I am fairly confident of the true values for the process. For reference, the true values I used for the 3 parameters are d=0.22, f=0.40, \phi=0.80.
What I am finding is that the traceplot is indicating that - of 4 chains, 2 are not mixing very well, for the f and \phi parameters. The pairs plot is showing (I think) that 2 chains are producing samples centered on the correct values and the other two are converging to some other values which are clearly not correct. For the \phi and f parameters, the pairs plot shows a multi-modal distribution anyway.
However if I switch to using only 2 chains, those two chain mix fairly well and the results I get are fairly close to the results I would expect to see (d is a little high at 0.30).
I have to assume that there is a good reason for using 4 chains, rather than 2 chains, and anyway increasing the number of chains as I thought just meant you were able to explore the probability mass rather more effectively and thoroughly than having fewer chains. So I am a bit puzzled as to why this might happen, and more concerned about what to do about it.
I have been trying to read as much as I could about this problem but I havenât been able to resolve this by increasing the number of warmup samples, or increasing max_treedepth or adapt_delta parameters etc.
All the files required to reproduce the problem (.stan, .R data files etc) are available in my github account
If anyone is able to provide any help with this problem I would be very appreciative!
Thanks,
Richard
data {
int<lower=1> N; // num observations
real y[N]; // observed outputs
}
parameters {
real<lower=-1,upper=1> phi; // autoregression coeff
real<lower=0,upper=0.5> d;
real<lower=0,upper=0.5> f;
}
model {
vector[N] nu;
vector[N] err;
real g[N+1];
real u;
u=cos(2.0*pi()*f);
g[1] = 1.0; // this and next few lines to derive the Ggbr Coefficients.
g[2] = 2.0*u*d;
for (j in 3:(N+1))
{
real rj1;
real temp;
rj1 = j-1;
temp = (d-1.0)/rj1;
g[j]=(2.0*u*(rj1+d-1.0)*g[j-1]-(rj1+2.0*d-2.0)*g[j-2])/rj1;
}
nu[1] = 0.0;
err[1] = y[1] - nu[1];
for (t in 2:N) {
real sum_g;
sum_g = 0.0;
for (j in 1:(t-1)) sum_g += g[j+1]*err[t-j];
nu[t] = phi * y[t-1] + sum_g;
err[t] = y[t] - nu[t];
}
phi ~ uniform(-1.0, 1.0);
d ~ uniform(0.0, 0.5);
f ~ uniform(0.0, 0.5);
err ~ normal(0.0, 1.0);
}