Spectral Density Analysis of Time-Series: Resulting Posterior is Independent of Data

San0ix · May 9, 2022, 10:02am

Hello everyone, I hope this problem isn’t too specific – as a beginner in Stan, I really cannot tell how or why my issue arises, so I don’t know how to simplify it.
I’m currently working on applying Variational Inference methods on non-parametric Bayesian spectral density estimation for time-series. Following this paper by Choudhuri N, Ghosal S, Roy A., I have defined the following model in Stan, making the latent parameter k constant since I could not marginalize it out.

To summarize the model briefly:
Let f^\star be the spectral density of a time-series. Then, normalize f^\star, such that it maps from [0, 1): f(\omega) := f^\star(\pi \cdot \omega). Then, normalize this further, such that the total mass is 1: q := f / \tau, where \tau is the total mass of the spectral density (on [0, \pi]).
Now, set a Bernstein-polynomial prior on the normalized spectral density q with k polynomials.
Represent the Dirichlet-Process G in the Bernstein-polynomial prior using its stick-breaking representation: G = \sum_{l = 1}^\infty p_l \delta_{Z_l}, where p_l are the stick-breaking lengths and Z_l the locations. Truncate this representation after L terms.
Finally, compute the periodogram U at \nu evenly spaced frequencies between 0 and \pi. Approximately, the distribution of the periodogram at frequency \omega^{\star}_l will be exponential with mean f^\star(\omega^{\star}_l). Use this as the likelihood.

data {
  int<lower=1> nu;          // number of smoothed periodogram realizations
  vector[nu] omega;         // (normalized) frequencies at which the periodogram was evaluated
  vector[nu] U;             // periodogram realizations
  int<lower=1> L;           // truncation constant for the stick-breaking representation of the DP
  real<lower=0> M;          // mass of the DP
  int<lower=1> k;           // number of bernstein polynomials used in approximation of the spectral density
  real<lower=0> lambda_0;   // hyper-parameter for the mass of the spectral density
}
parameters {
  vector<lower=0, upper=1>[L+1] Z;  // the stick-breaking locations, distributed according to G_0
  vector<lower=0, upper=1>[L] V;    // stick-breaking fractions,     distributed according to Gamma(1, M)
  real<lower=0> tau_inv;            // inverse of the total mass of the spectral density
}
transformed parameters {
  // reciprocal of tau
  real tau = 1 / tau_inv;

  // compute the stick-breaking lengths p from the stick-breaking fractions V
  vector[L+1] log_p;
  vector[L+1] p;
  real log_prev_remainder = 0;
  log_p[1] = log(V[1]);
  for (l in 2:L) {
    log_prev_remainder = log_prev_remainder + log(1 - V[l-1]);
    log_p[l] = log_prev_remainder + log(V[l]);
  }
  p = exp(log_p);

  p[L+1] = 1;
  for (l in 1:L) {
    p[L+1] -= p[l];
  }

  // compute the components w_{j,k}, which correspond to the DP (after stick-breaking),
  // evaluated on the interval ((j-1) / k, j / k] in the bernstein-polynomial prior for the spectral density
  vector[k] w;
  for (j in 1:k) {
    w[j] = 0;
    for (l in 1:(L+1)) {
      int I;
      I = (Z[l] <= (j * 1.0 / k)) - (Z[l] <= ((j - 1) * 1.0 / k));
      w[j] += p[l] * I;
    }
  }

  // compute the approximate spectral density according to the bernsteil polynomial prior
  vector[nu] f;
  vector[nu] f_inv;
  for (n in 1:nu) {
    f[n] = 0;
    for (j in 1:k) {
      f[n] += w[j] * exp(beta_lpdf(omega[n] | j, k - j + 1));
    }
    f[n] *= tau;
    f_inv[n] = 1 / f[n];
  }

}
model {
  // priors
  Z         ~ uniform(0, 1);                // Z ~ G_0
  V         ~ gamma(1, M);                  // V ~ Gamma(1, M)
  tau_inv   ~ exponential(lambda_0);        // tau^-1 ~ Exp(lamdbda_0)

  // whittle likelihood
  U ~ exponential(f_inv);
}

However, I found that when fitting the model through Stan’s ADVI, I always appear to get the same kind of posterior for the spectral density, regardless of the data at hand.

For comparison, I have used two time-series: the sunspots dataset (by year), and a simulated AR(1) process with a = 0.95. For each, I have generated 20 models using ADVI, each with a posterior sample-size of 10,000. This results in 20 different posterior-mean estimators for the spectral density. As final estimators, I have plotted the posterior-mean for the model with the highest ELBO, as well as the mean of the posterior-means and a weighted mean (using the ELBO’s as weights).
For reference, I have compared these to a MCMC estimate from another R-package (beyondWhittle), as well as the true posterior, if available.

You can find plots of the results here: Imgur: The magic of the Internet
The left y-scale is used for the results from Stan’s ADVI, the left one for the reference posterior(s).

I should also note that using Stan’s sample-function generates essentially non-sensical results, which are also the same, regardless of dataset. You can see one such result in the gallery above.

nhuurre · May 9, 2022, 7:21pm

This representation creates L blobs of probability mass and allocates each to one of the k buckets depending on the latent random variable Z \sim U[0,1]. The likelihood is discontinuous in Z and that’s problematic for Stan. I don’t know if this is what’s causing the issue but it’s the most obvious problem.

Stan has a simplex type that implements a stick-breaking process. I think you can use that instead

data {
  int<lower=1> nu;          // number of smoothed periodogram realizations
  vector[nu] omega;         // (normalized) frequencies at which the periodogram was evaluated
  vector[nu] U;             // periodogram realizations
  real<lower=0> M;          // mass of the DP
  int<lower=1> k;           // number of bernstein polynomials used in approximation of the spectral density
  real<lower=0> lambda_0;   // hyper-parameter for the mass of the spectral density
}
parameters {
  simplex[k] w; // SIMPLEX TYPE
  real<lower=0> tau_inv;            // inverse of the total mass of the spectral density
}
transformed parameters {
  // reciprocal of tau
  real tau = 1 / tau_inv;

  // compute the approximate spectral density according to the bernsteil polynomial prior
  vector[nu] f;
  vector[nu] f_inv;
  for (n in 1:nu) {
    f[n] = 0;
    for (j in 1:k) {
      f[n] += w[j] * exp(beta_lpdf(omega[n] | j, k - j + 1));
    }
    f[n] *= tau;
    f_inv[n] = 1 / f[n];
  }

}
model {
  // priors
  w ~ dirichlet(rep_vector(M,k)); // DIRICHLET PRIOR
  tau_inv   ~ exponential(lambda_0);        // tau^-1 ~ Exp(lamdbda_0)

  // whittle likelihood
  U ~ exponential(f_inv);
}

San0ix · May 10, 2022, 12:37pm

Thank you for the suggestion! :)
I wasn’t aware Stan has a simplex type – even though this seems to remove some of the flexibility of the model, the stability and performance of the optimization seems much better this way.
However, this does not appear to have fixed the underlying issue; While the general shape of the posterior has changed a little now, it still does not change (enough) given different datasets.

nhuurre · May 11, 2022, 6:35pm

So the posterior is not responsive to data. Weird, huh. Maybe the likelihood is too weak compared to the prior?
The paper says

the prior on \tau^{-1} is exponential with mean 10^{-20}.

and that mean seems fairly extreme to me?

I ran a quick simulation-based calibration experiment. (Probably not quite right, you could code a better generator.)

library(rstan)
library(SBC)
options(mc.cores = parallel::detectCores())

sim <- function(omega, M, k, lambda_0) {
  nu <- length(omega)
  tau_inv <- rexp(1, rate = lambda_0)
  tau <- 1 / tau_inv
  
  w <- rgamma(k, rate = 1, shape = M)
  w <- w / sum(w)

  log_lik <- 0
  U <- rep(0, nu)
  for (n in 1:nu) {
    f <- 0
    for (j in 1:k) {
      f <- f + w[j]*dbeta(omega[n], j, k - j + 1)
    }
    inv_f <- 1 / (tau * f)
    U[n] <- rexp(1, rate = inv_f)
    log_lik <- log_lik + dexp(U[n], rate = inv_f, log = TRUE)
  }

  list(
    variables = list(
      tau_inv = tau_inv,
      log_lik = log_lik
    ),
    generated = list(
      nu = nu,
      omega = omega,
      U = U,
      M = M,
      k = k,
      lambda_0 = lambda_0
    )
  )
}

generator <- SBC_generator_function(sim, omega = 1:288 / 289, M = 1, k = 10, lambda_0 = 1e20)
datasets <- generate_datasets(generator, n_sims = 20)

model <- stan_model("specden.stan")
backend <- SBC_backend_rstan_sample(model)

results <- compute_SBC(datasets, backend)

plot_ecdf_diff(results)
plot_sim_estimated(results, alpha = 0.5)

Stan model specden.stan is the one from my previous post with

generated quantities {
  real log_lik = exponential_lpdf(U | f_inv);
}

Looks maybe ok, kind of hard to say, probably needs larger n_sims to say for sure.

If it does work under SBC but not for real data then it could be some kind of conflict between prior and data, I guess.

San0ix · May 12, 2022, 7:11am

Oh yea, I think you are totally right!
I’ve been running the fitting with a an exponential prior with mean 10^{-10} on \tau^{-1}, since the mean of 10^{-20} that was suggested in the paper was causing numerical issues.
I have now re-run VI with 10^{-1}, 10^{-3}, 10^{-5} for the sunspots dataset (according to the MCMC method I’ve been using as a reference, the posterior median for \tau is approximately 3.6). You can see the results in this gallery. It seems pretty clear that the prior has a very strong on the resulting posterior.

Thank you for your help! :)

Topic		Replies	Views
Speed up adaptation with Variational Approximation? Modeling	22	2106	May 16, 2017
Bayesian finite mixture model in stan Modeling	15	2278	September 19, 2017
Simple model that converges in JAGS but not in Stan Modeling	18	1686	June 9, 2019
Bayesian inference in Stan Modeling	40	5952	July 18, 2017
Two parameter distribution over the simplex Modeling	23	2974	April 13, 2018

Spectral Density Analysis of Time-Series: Resulting Posterior is Independent of Data

Related topics