So I wanted to expand a bit on the “split and marginalize” option I vaguely described. The idea required a bit of additional work, but I think it ended up working pretty nice. In the math here, we’ll ignore amplitude as it is easy to add at the very end. I hope I didn’t make any big mistake in the math, but the model seems to work…
So we have a bounded periodic function f and I’ll assume the period is 2 \pi, i.e. that f(x) = f(x + 2k \pi), -1 \leq f(x) \leq 1 for all k \in \mathrm{Z}, x \in \mathrm{R}. We observe a scaled and shifted realization: y \sim Normal(f(\nu x + p), \sigma) and we want to determine \nu, p.
We’ll pick two arbitrary numbers x_1, x_2 as data. The only requirement is that those points are quite far from each other, but are sorrounded by actually observed values. We’ll parametrize our model in terms of the value of the function at those points:
The main advantage is that z_1, z_2 \in (-1, 1) will almost always be well constrained by data, so no multimodality problems here. But how do we get from z to \nu, p?
We’ll start by looking at preimages of z within a single period. We’ll assume that each point has exactly s distinct preimages in each period (some care would be needed to handle cases where some values have more preimages than others, but it is IMHO possible). For the \sin function s = 2 (we’ll never have exactly z_i = \pm 1).
We then introduce additional parameter k that says how many full periods of the function are between x_1 and x_2. So we have a set of pairs of preimages and we can use those to find all solutions for \nu, p given z.
**EDIT: ** There was previously a sign error in this equation.
So beyond z_1, z_2 we need 3 discrete parameters: i,j,k. If we can put an upper bound on k (which we IMHO always can as e.g. period shorter then distance between two consecutive points can be ruled out), we can marginalize all the discrete parameters out! This way, we sum out all the modes. The necessary background is at 7.2 Change point models | Stan User’s Guide
Concrete example
For f(x) = \sin(x) and i \in {1,2} we have
**EDIT: ** Here I previously invoked symmetry which does not seem to actually hold generally.
And this is the resulting Stan model (also including amplitude):
data{
int n ;
vector[n] y ;
vector[n] x ;
int<lower=0> max_k;
real x_fixed[2];
}
transformed data {
real log_prior[max_k + 1] = rep_array(-log(max_k + 1.0), max_k + 1); //prior prob 1/(max_k + 1)
}
parameters{
real<lower=0> noise ;
real<lower=0> amplitude;
real<lower=-1, upper=1> z[2];
}
transformed parameters{
real frequency[2, 2, max_k + 1];
real phase[2, 2, max_k + 1];
real log_lik[2, 2, max_k + 1];
{
real asinz[2] = asin(z);
real w[2,2];
w[1,]= { asinz[1], pi() - asinz[1]};
w[2,]= { asinz[2], pi() - asinz[2]};
for(i in 1:2) {
for(j in 1:2) {
for(k in 1:(max_k + 1)) {
frequency[i, j, k] = (w[2, i] + 2 * (k-1) * pi() - w[1, j]) / (x_fixed[2] - x_fixed[1]);
phase[i, j , k] = w[1, j] - frequency[i, j, k] * x_fixed[1];
{
vector[n] f = amplitude * sin(frequency[i, j, k] * x + phase[i, j, k]);
log_lik[i, j, k] = normal_lpdf(y | f, noise)
+ log_prior[k]
-log(2.0) //Uniform prior over the 2 options
;
}
}
}
}
}
}
model{
noise ~ weibull(2,1) ; //peaked at ~.8, zero-at-zero, ~2% mass >2
amplitude ~ weibull(2,1) ; //ditto
target += log_sum_exp(to_array_1d(log_lik));
}
generated quantities {
// Discrete sampling is inefficient, directly computing weights
// And taking weighted expectation values would work better
int index = categorical_logit_rng(to_vector(to_array_1d(log_lik)));
real freq_chosen = to_array_1d(frequency)[index];
real phase_chosen = to_array_1d(phase)[index];
//Force phase to lie between -pi + pi the stupid way
while(phase_chosen < -pi()) {
phase_chosen += 2 * pi();
}
while(phase_chosen > pi()) {
phase_chosen -= 2 * pi();
}
}
And this is code I used to test this thing:
library(tidyverse)
library(cmdstanr)
options(mc.cores = 4)
m <- cmdstan_model("periodic.stan")
n <- 30
frequency <- 13
phase <- 0.3
noise <- 0.5
amplitude <- 1.4
x <- runif(n, max = 2 * pi)
f <- amplitude * sin(frequency * x + phase)
y <- rnorm(n, mean = f, sd = noise)
x_fixed = c(pi / 4, 7/4 * pi)
#x_fixed = c(3.3, 7/4 * pi)
res <- m$sample(list(n = n, x = x, y = y, x_fixed = x_fixed, max_k = 10), refresh = 500)
res$summary(c("noise", "z", "freq_chosen", "phase_chosen", "amplitude", "index"))
bayesplot::mcmc_pairs(res$draws(), pars = c("z[1]","z[2]", "amplitude", "noise"))
true <- tibble(x = seq(0, 2 * pi, length.out = 101)) %>% mutate(y = amplitude * sin(frequency * x + phase))
observed <- tibble(x = x, y = y)
d <- posterior::as_draws_matrix(res$draws(c("freq_chosen", "phase_chosen", "amplitude")))
z <- tibble(x = seq(0, 2 * pi, length.out = 100)) %>% crossing(as_tibble(d) %>% mutate(id = 1:dim(d)[1]))
z %>% mutate(y = amplitude * sin(x * freq_chosen + phase_chosen)) %>%
filter(id %in% sample(unique(id), 100)) %>%
ggplot(aes(x = x, y = y, group = id)) +
geom_vline(xintercept = x_fixed[1], color = "orange", size = 2) +
geom_vline(xintercept = x_fixed[2], color = "orange", size = 2) +
geom_line(alpha = 0.5) +
geom_line(data = true, group = 1, color = "blue", size = 2) +
geom_point(data = observed, group = 1, color = "red", size = 2)
This method seems (from quikc tests) very data efficient (because sine is a very constrained function)
And will work even when the data do not really constrain k, so the posterior for phase and frequency should be multimodal:
set.seed(24855)
n <- 8
frequency <- 4
phase <- -0.5
noise <- 0.5
amplitude <- 1.4
Histogram of the fitted frequency
- we see that multiple modes are present:
Happy to hear if it passes/breaks in your tests as well (mine were pretty shallow).