How to I do non-centered parameterize time series parameters?

Hi all,

I’m a new Stan user learning the paradigm for expressing models as Stan programs.
I am modeling a time series data,and I wonder how can I do it using non-centered parameters?
such as
beta[t]=rhobeta[t-1]+sigmaepsilon[t]
where epsilon[t]~normal(0,1) ?
Thanks a lot for any input

With a loop over t, that is the non-centered parameterization for an AR(1), and it often yield more efficient sampling than the centered version.

Thank you very much.
Can I finish it with a vector version?

No, you have to loop.

thanks

And if I non-centered the parameters, does I have to use target+ instead of sample statement even for the priors such as beta[t]~normal(mu,sigma)?

If you are using this non-centered construction, then the beta vector is constructed in the transformed parameters block and does not have a prior. The prior is

target += normal_lpdf(epsilon | 0, 1);

the epsilon[t]~normal(0,1) is sample model.
besides this, I used extra prior for beta[t] when using centered structure.
beta~normal(0,2);//prior
alpha~normal(0,2);//prior
beta[2:T]~normal(rhobeta[1:T-1],sigma1)
alpha[2:T]~normal(rho
alpha[1:T-1],sigma2)
x~normal(alpha+beta,sigma3)//x is observed data.
But this was very slow.
So I decided to non-center it.
How can I change
beta~normal(0,2);//prior
alpha~normal(0,2);//prior
to priors of epsilon?

My experience is that AR unobserved parameters and normal noise can
be multi-modal in a way that nearly converges but because it’s so high-dimensional
chains can get trapped s.t. if you look at the trace plots they’ll look like
they are mixing but a lot of parameters will be slightly off. Poisson noise
doesn’t seem to have that problem. I think this version shows that problem sometimes:

data {
  int T;  // Number of time points.
  int K;  // Number of coefficients.
  int M;
  vector[T] y;  // Measurement at each T
  vector[M] kappa; // scaling.
}

parameters {
  vector[K] beta;
  vector<lower=0>[2] sigma;
  vector[T] zeta;
}

transformed parameters {
  vector[T] mu;
  mu[1] = zeta[1];
  for (t in 2:T) {
    mu[t] = zeta[t]*sigma[1] + inv_logit(beta[1]) * mu[t-1];
//    print("t: ", t, ", mu[t]: ", mu[t], ", zeta[t]: ", zeta[t],
//          ", sigma[1]: ", sigma[1], ", beta[1]: ", beta[1],
//          ", mu[t-1]: ", mu[t-1]
//    );
  }
}

model {
  target += normal_lpdf(beta | 0.0, 1.0);
  target += gamma_lpdf(sigma | 4.0, 10);
  target += normal_lpdf(zeta | 0.0, 1);
  for (t in 1:T) {
    target += normal_lpdf(y[t] | mu[t], sigma[2]);
  }
}

Here’s a run script for it:

T <- 400
J <- 2
x <- matrix(data=0, nrow=J, ncol=T)
y <- matrix(data=0, nrow=J, ncol=T)
beta <- runif(n=J, min=0, max=.95)

for (j in 1:J) {
  for (t in 2:T) {
    x[j,t] <- beta[j]*x[j,t-1] + rnorm(1,0,.7)
    y[j,t] <- x[j,t] + rnorm(1,0,.3)
  }
}

library(rstan)
m_ar_nc <- rstan::stan_model(file='non-centered-ar+noise.stan')

ml <- list()
for (j in 1:J) {
  ml[[j]] <- rstan::sampling(object=m_ar_nc,
    data=list(T=T, K=1, y=y[j,], M=1, kappa=array(1)),
    control=list(adapt_delta=.90),
    chains=5, iter=2000, warmup=1500)
}

o <- list()
for (j in 1:J) {
  o[[j]] <- ml[[j]] %>% rstan::extract() %>% data.frame %>%
    dplyr::mutate(model=as.character(j), iteration=1:nrow(.))
}
ob <- data.table::rbindlist(o)

pl <- ggplot(
  data=ob,
  aes(x=iteration, y=plogis(beta), colour=model)
) + geom_line() + geom_abline(intercept=beta, slope=0) +
    theme_minimal() + facet_wrap( ~ model)
print(pl)

Some of these plots are useful:


sigmas <- rstan::extract(ml[[1]])$sigma %>% data.frame(check.names=FALSE) %>%
  dplyr::mutate(iteration=1:2500, chain=sort(rep(1:5,500))) %>%
  tidyr::gather(sigma, value, -iteration, -chain) %>%
  dplyr::mutate(parameter=paste('sigma', sigma))

pl_sigmas <- ggplot(data=sigmas,
  aes(x=iteration, y=value^2, colour=factor(sigma), shape=factor(chain))) +
geom_point() + theme_minimal()

betas <- rstan::extract(ml[[1]])$beta %>% data.frame(check.names=FALSE) %>%
  dplyr::mutate(iteration=1:2500, chain=sort(rep(1:5,500))) %>%
  tidyr::gather(beta, value, -iteration, -chain) %>%
  dplyr::mutate(parameter=paste('beta', beta))

pl_betas <- ggplot(data=betas,
  aes(x=iteration, y=plogis(value), colour=factor(beta), shape=factor(chain))) +
geom_point() + theme_minimal()


zetas <- rstan::extract(ml[[1]])$zeta %>%
  data.frame(check.names=FALSE) %>%
  dplyr::mutate(iteration=rep(1:500,5), chain=sort(rep(1:5,500))) %>%
  tidyr::gather(zeta, value, -iteration, -chain)

pl_zetas <- ggplot(data=zetas,
  aes(x=as.numeric(zeta), y=value, colour=factor(chain))
) + geom_point(size=0.001) + theme_minimal()


mus <- rstan::extract(ml[[1]])$mu %>%
  data.frame(check.names=FALSE) %>%
  dplyr::mutate(iteration=rep(1:500,5), chain=sort(rep(1:5,500))) %>%
  tidyr::gather(mu, value, -iteration, -chain)

pl_mus <- ggplot(data=mus,
  aes(x=as.numeric(mu), y=value, colour=factor(chain))
) + geom_point(size=0.001) + theme_minimal()

Incidentally if anyone knows how to avoid that particular multi-modality problem I’d love to hear about it!

Thanks. Very appreciate.

No problem. Also I just wanted to confirm that this is the model that fails to converge in a relatively subtle way for the simulation data in the R script I sent. I would be cautious about that issue when trying it on real data. Any single-chain analysis will hide this.

At least in this case you can vectorize:

for (t in 2:T) {
    mu[t] = zeta[t]*sigma[1] + inv_logit(beta[1]) * mu[t-1];

as

mu[2:T] = zeta[2:T] * sigma[1] + inv_logit(beta[1]) * mu[1:(T-1)];

And in case anyone doesn’t realize what’s going on under the hood, target += foo_lpdf forms are less efficient than the ~ versions because they don’t drop constant terms.

Good point, if we don’t have an AR case study yet I have a set of these I’m
hoping to put together.

Not in an auto-regressive context.

transformed parameters {
  vector<lower=negative_infinity()>[T] mu;
  mu[1] = zeta[1];
  mu[2:T] = zeta[2:T] * sigma[1] + inv_logit(beta[1]) * mu[1:(T-1)];

yields

"validate transformed params: mu[3] is nan, but must be greater than or equal to -inf"