Hello everyone, I hope this problem isn’t too specific – as a beginner in Stan, I really cannot tell how or why my issue arises, so I don’t know how to simplify it.
I’m currently working on applying Variational Inference methods on non-parametric Bayesian spectral density estimation for time-series. Following this paper by Choudhuri N, Ghosal S, Roy A., I have defined the following model in Stan, making the latent parameter k constant since I could not marginalize it out.
To summarize the model briefly:
Let f^\star be the spectral density of a time-series. Then, normalize f^\star, such that it maps from [0, 1): f(\omega) := f^\star(\pi \cdot \omega). Then, normalize this further, such that the total mass is 1: q := f / \tau, where \tau is the total mass of the spectral density (on [0, \pi]).
Now, set a Bernstein-polynomial prior on the normalized spectral density q with k polynomials.
Represent the Dirichlet-Process G in the Bernstein-polynomial prior using its stick-breaking representation: G = \sum_{l = 1}^\infty p_l \delta_{Z_l}, where p_l are the stick-breaking lengths and Z_l the locations. Truncate this representation after L terms.
Finally, compute the periodogram U at \nu evenly spaced frequencies between 0 and \pi. Approximately, the distribution of the periodogram at frequency \omega^{\star}_l will be exponential with mean f^\star(\omega^{\star}_l). Use this as the likelihood.
data {
int<lower=1> nu; // number of smoothed periodogram realizations
vector[nu] omega; // (normalized) frequencies at which the periodogram was evaluated
vector[nu] U; // periodogram realizations
int<lower=1> L; // truncation constant for the stick-breaking representation of the DP
real<lower=0> M; // mass of the DP
int<lower=1> k; // number of bernstein polynomials used in approximation of the spectral density
real<lower=0> lambda_0; // hyper-parameter for the mass of the spectral density
}
parameters {
vector<lower=0, upper=1>[L+1] Z; // the stick-breaking locations, distributed according to G_0
vector<lower=0, upper=1>[L] V; // stick-breaking fractions, distributed according to Gamma(1, M)
real<lower=0> tau_inv; // inverse of the total mass of the spectral density
}
transformed parameters {
// reciprocal of tau
real tau = 1 / tau_inv;
// compute the stick-breaking lengths p from the stick-breaking fractions V
vector[L+1] log_p;
vector[L+1] p;
real log_prev_remainder = 0;
log_p[1] = log(V[1]);
for (l in 2:L) {
log_prev_remainder = log_prev_remainder + log(1 - V[l-1]);
log_p[l] = log_prev_remainder + log(V[l]);
}
p = exp(log_p);
p[L+1] = 1;
for (l in 1:L) {
p[L+1] -= p[l];
}
// compute the components w_{j,k}, which correspond to the DP (after stick-breaking),
// evaluated on the interval ((j-1) / k, j / k] in the bernstein-polynomial prior for the spectral density
vector[k] w;
for (j in 1:k) {
w[j] = 0;
for (l in 1:(L+1)) {
int I;
I = (Z[l] <= (j * 1.0 / k)) - (Z[l] <= ((j - 1) * 1.0 / k));
w[j] += p[l] * I;
}
}
// compute the approximate spectral density according to the bernsteil polynomial prior
vector[nu] f;
vector[nu] f_inv;
for (n in 1:nu) {
f[n] = 0;
for (j in 1:k) {
f[n] += w[j] * exp(beta_lpdf(omega[n] | j, k - j + 1));
}
f[n] *= tau;
f_inv[n] = 1 / f[n];
}
}
model {
// priors
Z ~ uniform(0, 1); // Z ~ G_0
V ~ gamma(1, M); // V ~ Gamma(1, M)
tau_inv ~ exponential(lambda_0); // tau^-1 ~ Exp(lamdbda_0)
// whittle likelihood
U ~ exponential(f_inv);
}
However, I found that when fitting the model through Stan’s ADVI, I always appear to get the same kind of posterior for the spectral density, regardless of the data at hand.
For comparison, I have used two time-series: the sunspots dataset (by year), and a simulated AR(1) process with a = 0.95. For each, I have generated 20 models using ADVI, each with a posterior sample-size of 10,000. This results in 20 different posterior-mean estimators for the spectral density. As final estimators, I have plotted the posterior-mean for the model with the highest ELBO, as well as the mean of the posterior-means and a weighted mean (using the ELBO’s as weights).
For reference, I have compared these to a MCMC estimate from another R-package (beyondWhittle), as well as the true posterior, if available.
You can find plots of the results here: Imgur: The magic of the Internet
The left y-scale is used for the results from Stan’s ADVI, the left one for the reference posterior(s).
I should also note that using Stan’s sample
-function generates essentially non-sensical results, which are also the same, regardless of dataset. You can see one such result in the gallery above.