Ideas for modelling a periodic timeseries

Can you try this? I ripped the Kalman stuff from @JuhoKokkala kalman-stan-randomwalk/randomwalk_kalman.stan at master · juhokokkala/kalman-stan-randomwalk (github.com). I tried arcsine but this works better.

functions {
    real is_std_norm_lpdf(vector x) {
     // real log_constant = -0.5 * log(2 * pi());
     int N = num_elements(x);
     vector[N] cos1p = cos(x) + 1;
     return sum(log(cos1p) - 0.5 * square( ( sin(x) ./ cos1p ) ));
   }

  vector[] kalman (vector y, real sigma_y, real sigma_z, real q, real m0, real P0) {
     int N = num_elements(y);
     vector[N] out[2];
     vector[N] m;
     vector[N] P;
     real R = square(sigma_y) + square(sigma_z); //measurement variance,  y_t|x_t
     real S; //measurement variance, y_t|y_1:t-1
     //vector[N] m_pred; //predicted mean x_t|y_1:t-1
     vector[N] P_pred; //predicted var x_t|y_1:t-1
     real K;  //Kalman gain, depends on t but not stored
     
     out[1, 1] = m0;
     P_pred[1] = P0;
     
     for (n in 1:N) {
        //Prediction step
        if (n > 1) {
            out[1, n] = m[n - 1];
            P_pred[n] = P[n - 1] + square(q);
        }

        //Update step
        S = P_pred[n] + R;
        K = P_pred[n] / S;
        m[n] = out[1, n] + K * (y[n] - out[1, n]);   //The measurement is just noise added to signal, so mu==m_pred
        P[n] = P_pred[n] - K * S * K;
        
        out[2, n] = sqrt(S);
    }    
     
    return out; 
   }
}
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real<lower=0> sigma;
  vector<lower=-pi(), upper=pi()>[N] z;
}
transformed parameters {
  // variance of arcsine distribution is 0.125
  vector[N] out[2] = kalman(z, 0.125, sigma, 1, z[1], sigma);
}
model {
  sigma ~ exponential(1);
  z ~ is_std_norm();
  
  y ~ normal(sin(out[1]), sigma);
}
generated quantities {
  vector[N] f;
  
  for (n in 1:N) 
    f[n] = normal_rng(sin(out[1, n]), sigma);
}

For sin(x+sin(x)) data (blue is 97% Interval on the latent function):

and for sin(x):

Contrast with a naive AR1:

data {
	int<lower=0> n;
	vector[n] y;
}
parameters {
	real<lower=0> noise ;
	real<lower=0> amp ;
	vector[n] f ;
	real<lower=0> f_s;
}
model {
	noise ~ weibull(2,1) ;
	amp ~ weibull(2,1) ;
	f_s ~ weibull(2,1) ;
	f[2:n] - f[1:(n-1)] ~ normal(0,f_s) ;
	y ~ normal(f, noise);
}
generated quantities {
	vector[n] yrep;
	for (i_n in 1:n){
		yrep[i_n] = normal_rng(f[i_n], noise);
	}
}


2 Likes

Nice! I was testing it on the noisy setting and not with lots of data. It seems to work pretty well especially if you want speed and unbiased estimates, though probably a bit more variance than other methods.

AR1 model 1.7s ess in the mid 500s

My model 1.8s ess in the 1000-2000

with

# generate data ----
set.seed(1)
dat = (
  tibble(
    time = seq(0,10,by=1/10)
    , f_true = asym_sine(time,hz=1,phase=pi/2)
    , y = f_true + rnorm(length(time),0,.9)
    , y_std =( y - mean(y) ) / ( 2 * sd(y) )
  )
)

With more data

1 Like

So this works across a range of SNRs?

Say 10%-10000%?

So I wanted to expand a bit on the “split and marginalize” option I vaguely described. The idea required a bit of additional work, but I think it ended up working pretty nice. In the math here, we’ll ignore amplitude as it is easy to add at the very end. I hope I didn’t make any big mistake in the math, but the model seems to work…

So we have a bounded periodic function f and I’ll assume the period is 2 \pi, i.e. that f(x) = f(x + 2k \pi), -1 \leq f(x) \leq 1 for all k \in \mathrm{Z}, x \in \mathrm{R}. We observe a scaled and shifted realization: y \sim Normal(f(\nu x + p), \sigma) and we want to determine \nu, p.

We’ll pick two arbitrary numbers x_1, x_2 as data. The only requirement is that those points are quite far from each other, but are sorrounded by actually observed values. We’ll parametrize our model in terms of the value of the function at those points:

-1 \lt z_i \lt 1 \\ z_1 = f(\nu x_1 + p) \\ z_2 = f(\nu x_2 + p)

The main advantage is that z_1, z_2 \in (-1, 1) will almost always be well constrained by data, so no multimodality problems here. But how do we get from z to \nu, p?

We’ll start by looking at preimages of z within a single period. We’ll assume that each point has exactly s distinct preimages in each period (some care would be needed to handle cases where some values have more preimages than others, but it is IMHO possible). For the \sin function s = 2 (we’ll never have exactly z_i = \pm 1).

0 \leq w_{i,j} \lt 2 \pi \\ \{w_{1,1}, ..., w_{1, s}\} = f^{-1}(z_1) \\ \{w_{2,1}, ..., w_{2, s}\} = f^{-1}(z_2)

We then introduce additional parameter k that says how many full periods of the function are between x_1 and x_2. So we have a set of pairs of preimages and we can use those to find all solutions for \nu, p given z.

**EDIT: ** There was previously a sign error in this equation.

\nu_{i,j, k} = \frac{w_{2,i} + 2k \pi - w_{1,j}}{x_2 - x_1} \\ p_{i,j,k} = w_{1,j} - \nu_{i,j,k} x_1

So beyond z_1, z_2 we need 3 discrete parameters: i,j,k. If we can put an upper bound on k (which we IMHO always can as e.g. period shorter then distance between two consecutive points can be ruled out), we can marginalize all the discrete parameters out! This way, we sum out all the modes. The necessary background is at 7.2 Change point models | Stan User’s Guide

Concrete example

For f(x) = \sin(x) and i \in {1,2} we have

w_{i,1} = \arcsin(z_i) \\ w_{i,2} = \pi - \arcsin(z_i)

**EDIT: ** Here I previously invoked symmetry which does not seem to actually hold generally.

And this is the resulting Stan model (also including amplitude):

data{
	int n ;
	vector[n] y ;
	vector[n] x ;
	int<lower=0> max_k;
	real x_fixed[2];
}


transformed data {
  real log_prior[max_k + 1] = rep_array(-log(max_k + 1.0), max_k + 1); //prior prob 1/(max_k + 1)
}

parameters{
	real<lower=0> noise ;
	real<lower=0> amplitude;
	real<lower=-1, upper=1> z[2];
}

transformed parameters{
  real frequency[2, 2, max_k + 1];
  real phase[2, 2, max_k + 1];
  real log_lik[2, 2, max_k + 1];
  {
    real asinz[2] = asin(z);
    real w[2,2];
    w[1,]= { asinz[1], pi() - asinz[1]};
    w[2,]= { asinz[2], pi() - asinz[2]};
    for(i in 1:2) {
      for(j in 1:2) {
        for(k in 1:(max_k + 1)) {
          frequency[i, j, k] = (w[2, i] + 2 * (k-1) * pi() - w[1, j]) / (x_fixed[2] - x_fixed[1]);
          phase[i, j , k] = w[1, j] - frequency[i, j, k] * x_fixed[1];
          {
            vector[n] f = amplitude * sin(frequency[i, j, k] * x + phase[i, j, k]);
            log_lik[i, j, k] = normal_lpdf(y | f, noise)
              + log_prior[k]
              -log(2.0) //Uniform prior over the 2 options
              ;
          }
        }
      }
    }
  }
}

model{
	noise ~ weibull(2,1) ; //peaked at ~.8, zero-at-zero, ~2% mass >2
	amplitude ~ weibull(2,1) ; //ditto
	target += log_sum_exp(to_array_1d(log_lik));
}

generated quantities {
  // Discrete sampling is inefficient, directly computing weights
  // And taking weighted expectation values would work better
  int index = categorical_logit_rng(to_vector(to_array_1d(log_lik)));
  real freq_chosen = to_array_1d(frequency)[index];
  real phase_chosen = to_array_1d(phase)[index];
  //Force phase to lie between -pi + pi the stupid way
  while(phase_chosen < -pi()) {
    phase_chosen += 2 * pi();
  }
  while(phase_chosen > pi()) {
    phase_chosen -= 2 * pi();
  }
}

And this is code I used to test this thing:

library(tidyverse)
library(cmdstanr)
options(mc.cores = 4)

m <- cmdstan_model("periodic.stan")

n <- 30
frequency <- 13
phase <- 0.3
noise <- 0.5
amplitude <- 1.4

x <- runif(n, max = 2 * pi)
f <- amplitude * sin(frequency * x + phase)
y <- rnorm(n, mean = f, sd = noise)
x_fixed = c(pi / 4, 7/4 * pi)
#x_fixed = c(3.3, 7/4 * pi)

res <- m$sample(list(n = n, x = x, y = y, x_fixed = x_fixed, max_k = 10), refresh = 500)
res$summary(c("noise", "z", "freq_chosen", "phase_chosen", "amplitude", "index"))

bayesplot::mcmc_pairs(res$draws(), pars = c("z[1]","z[2]", "amplitude", "noise"))

true <- tibble(x = seq(0, 2 * pi, length.out = 101)) %>% mutate(y = amplitude * sin(frequency * x + phase))
observed <- tibble(x = x, y = y)

d <- posterior::as_draws_matrix(res$draws(c("freq_chosen", "phase_chosen", "amplitude")))
z <-  tibble(x = seq(0, 2 * pi, length.out = 100)) %>% crossing(as_tibble(d) %>% mutate(id = 1:dim(d)[1]))
z %>% mutate(y = amplitude * sin(x * freq_chosen + phase_chosen)) %>%
  filter(id %in% sample(unique(id), 100)) %>%
  ggplot(aes(x = x, y = y, group = id)) + 
  geom_vline(xintercept = x_fixed[1], color = "orange", size = 2) +
  geom_vline(xintercept = x_fixed[2], color = "orange", size = 2) +
  geom_line(alpha = 0.5) +
  geom_line(data = true, group = 1, color = "blue", size = 2) +
  geom_point(data = observed, group = 1, color = "red", size = 2)

This method seems (from quikc tests) very data efficient (because sine is a very constrained function)

And will work even when the data do not really constrain k, so the posterior for phase and frequency should be multimodal:

set.seed(24855)
n <- 8
frequency <- 4
phase <- -0.5
noise <- 0.5
amplitude <- 1.4

Histogram of the fitted frequency - we see that multiple modes are present:

Happy to hear if it passes/breaks in your tests as well (mine were pretty shallow).

4 Likes

@martinmodrak Ooooh, this is cool. Playing now …

Gotta break to take the kiddos outside, but so far @martinmodrak 's solution is performing really well. As good as a periodic GP in what I’ve explored so far while fitting in like 1/5 the time.

Here’s a SNR=.5 (via noise=2; amp=1) fit for the GP and Martin’s model, respectively:

And that’s with the GP knowing the true amplitude (I couldn’t get it to fit in a reasonable time when doing inference on the amplitude).

2 Likes

That’s great! This is a clever model and I don’t care that mine sucks. I threw it together in a few mins between lunch.

What happens if the period changes from 2pi but the assumed 2pi is still used? Is it flexible enough to fit that?

1 Like

It doesn’t assume 2*pi per se, martin was simply using that to convey that the math of a model doing inference on the frequency and phase works out to a scaling and shift of a function with a fixed period and phase.

Ah, hm; are the variables in x_fixed maybe serving to over inform on the frequency? I am getting weird results when leaving these as is but varying the true phase and frequency

Glad you are finding the model helpful! I had fun writing it (really should have spent the time doing other stuff, but the puzzle was too tempting :-) )

Right, I just need f to have a known period and to include k * period in the computation. I am not completely sure the model would work well if f was not really known (e.g. if one used spline/GPs to define the function within a single period.

If yes, then this is an unintended consequence or a bug - what values are giving you trouble?

In any case, I think the model is open to modification. Notably, there are two separate ideas in the model: the core is that I enumerate all possible separate modes of the likelihood function, index them by a discrete variable(s) and then marginalize this discrete variable out of the model. Using z as the base parameter is auxiliary - I simply found this easiest to work out into a functioning model, but I have no doubt one could build the marginalization around other parametrizations, e.g. starting from some base frequency range and identifying all “overtones” or starting from w (which might be helpful when f^{-1} is not easily available).

1 Like

Sorry, was a bug on my end, so far still working great but still working on seeing where I can get it to break :)

2 Likes

So turns out there was a sign error in my formula for p_{i,j,k} and that the symmetry I invoked for the \sin function holds only for k = 0 which could lead to some strange results. I made corrections to the post and fixed the Stan code therein.

1 Like

Also I found a situation where the choice of x_fixed (x_1,x_2) influences inferences. I think it is restricted to low data situations. I think the core problem is that the model assumes that f(freq * x + phase) is a priori uniform (and unimodal) at x_fixed while at other places, multimodality can be tolerated. So using my code from above:

set.seed(558825)
n <- 10
frequency <- 7
phase <- 0.3
noise <- 0.5
amplitude <- 1.4

x_fixed = c(pi / 4, 7/4 * pi)
...

Gives this posterior:

  variable        mean  median    sd   mad     q5     q95  rhat ess_bulk ess_tail
  <chr>          <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 noise         0.613   0.552  0.235 0.199  0.337  1.07    1.01     916.    1730.
4 freq_chosen   6.74    6.78   1.60  0.129  4.41   9.23    1.01    3693.    2265.
5 phase_chosen  0.950   0.979  0.837 0.516 -0.319  2.19    1.00    3323.    1668.
6 amplitude     0.981   1.01   0.290 0.264  0.437  1.40    1.00    1311.     924.

But if I change to

x_fixed = c(2.4, 7/4 * pi)

I get somewhat different picture:

 # A tibble: 7 x 10
  variable       mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 noise         0.782  0.754  0.261  0.264  0.412  1.24   1.01     573.     775.
4 freq_chosen   8.35   6.97   4.13   2.00   1.47  17.5    1.01    2609.    2866.
5 phase_chosen  0.741  0.771  1.19   1.03  -1.66   2.49   1.00    3842.    3459.
6 amplitude     0.810  0.821  0.328  0.359  0.264  1.32   1.00     864.    1135.

BTW @mike-lawrence it might make sense to change the title of the topic as I think it currently does not really reflect on the contents :-)

EDIT Fixed one more bug in the code… I wonder how it all even worked before :-D

1 Like

Just FYI, this has the wrong sign.

2 Likes

Excuse the vulgarity, but I literally just had a “holy f*ck” moment with the latest version of @martinmodrak 's model. SNR=1 but with a signal frequency (1.5Hz) and sample rate of 5Hz that is edging up to nyquist and certainly hard to see by eye:

Here it is with the true signal:

And with the fit:

3 Likes

It looks like a lot of work has been done on this already, but I didn’t see anyone suggest working in the frequency domain rather than the time (?) domain via a Fourier transform.

The original problem of fitting to a sinusoid leads to infinitely many modes because a sinusoid is a periodic function with infinite support in the time domain. A signal with infinite support in the time domain has finite support in the frequency domain (and vice versa), though note that you are working with a finite sample of an infinite signal. Applying a Fourier transform (i.e., an FFT) to the data should reduce the problem to one with a single mode (in the frequency domain).

A pure sinusoid in the time domain corresponds to a (scaled) Dirac delta function in frequency space, located at the frequency of the sinusoid. If the sinusoid is noisy, this corresponds to a delta function that is convolved with (“smeared by”) a function related to the noise — i.e., the delta function will look like a compact “spike” localized on the frequency of the sinusoid. My intuition says that the presence of noise should make locating the “spike” easier using a Bayesian model than in the noise-free case.

Note that if you want to reconstruct the sinusoid (rather than just estimate it’s frequency), you will need to model the real and imaginary (magnitude and phase) parts of the Fourier-transformed signal.

I would be surprised if someone has not already published a solution to this problem that you can use or build upon.

2 Likes

Why would this change anything about the multimodality?

Are you sure? I looked at frequency-domain stuff a bit a while ago and at first thought I’d discovered the cause of the multimodality as deriving from an implicit rectangluar window function (i.e. implied by not using any windowing of the input signal), but abandoned that idea after realizing it doesn’t account for the asymmetry.

Honestly I’ve been surprised by how little work I’ve been able to find on this kind of thing too! At first I thought it was a case where things were so trivial and long-ago-solved that folks didn’t even both producing explanations of the core problem and solutions anymore, but the wider I looked, I continued to find no good discussions of it. As I posted above, I even reached out to a community where I thought there’d be most frequent use of solutions to this, the astrostats folks, but even they seemed to be using very ad-hoc methods (thanks to really clean data) and didn’t have much in the way of guidance to deeper treatments. I did get pointed to Bretthorst as someone that has some work in this domain but despite my math inexpertise I think a bunch of his approximations are frankly in error as I wasn’t able to replicate his results (very possibly by implementation errors on my end though).

How would you implement a frequency-domain model?

1 Like

How do your data look?

Very ambiguous? (Low SNR, few measurements)
Slightly ambiguous? (Low SNR or few measurements)
Not ambiguous at all?