Von Mises data with a shift at every cycle

I am trying to model circular data that has a slight shift every cycle. The aim is to model actogram data, the daily activity of animals. In these experiments one of the most obvious phenotypes is the shift in the periodic activity. I should probably treat this as a time series, rather than doing this with circular distributions, but I don’t see why this shouldn’t work .

Fitting circular data with the von Mises distribution worked fine but when I added a shift in each cycle, things started to fall apart. Here is the simulated data I used.

library(circular)
library(tidyverse)
library(rstan)
options(mc.cores =parallel::detectCores())


# simulate circular data with shift 
##############################################################################################
n_cycle <- 8
df <- tibble(cycle=1:n_cycle) %>%
    mutate(y=map(cycle, ~rvonmises(100, 
                                 mu=circular(pi-2*pi/18*.x), 
                                 kappa=2*pi))) %>%
    unnest(y) 


ggplot(df, aes(x=y)) + geom_histogram() + facet_grid(cycle~.)


model_data <- list(
    n = nrow(df), 
    n_cycle = n_cycle,
    cycle = df$cycle, 
    y = df$y
)

model_name <- "von mises with shift.stan"
model <- stan(
    file = model_name,
    data = model_data, iter=2000, chains=10, warmup=1500
)

summary(model)

and here is the Stan model

data{
  int<lower=1> n;
  int<lower=1> n_cycle;
  int<lower=1, upper=n_cycle> cycle[n];  
  real<lower=0, upper=2*pi()> y[n];
}

parameters{
  real<lower=0, upper=2*pi()> mu; 
  real<lower=0> kappa; 
  // shift should be between -pi and pi
  real<lower=-pi()/4, upper=pi()/4> shift; 
}

model{
  shift ~ normal(0,pi()/12); 
  for (i in 1:n){
    if (kappa < 100)
      y[i] ~ von_mises(mu+shift*(cycle[i]-1), kappa);
    else
      y[i] ~ normal(mu+shift*(cycle[i]-1), sqrt(1 / kappa));    
    }
}


The problem is that some chains seem to converge on the proper values, but others don’t. There seems to be a trade off between kappa and the amount of shift. I thought that giving the shift a prior that favours smaller values would fix this, but even a very strong prior on the shift does not work. The pairs plot look like polka dots, some chains seem to find the right answer, and others converge to a very small value of kappa. In the von Moses distribution Kappa approaches 0 as the distribution flattens (kappa =0 is uniform distribution). I thought maybe there wasn’t enough data points, but increasing the amount of data doesn’t change things, and fitting a similar model without the shift with the same amount of data seems to work just fine.

Am I going on about the wrong way? I know that what I am really trying to estimate is the period of the circular data, rather than mu and kappa, but to me this should be equivalent to estimating the amount of “shift”, assuming this shift is constant.

I feel like I am missing something, probably some embarrassing mistake, but can’t figure out what, and I need to know! Any help or insight would be much appreciated.

1 Like

Sorry for not getting to you earlier, your question is well written and relevant. Did you manage to move forward in any way in the meantime?

Generally circular parameters (i.e. mu in your case) can easily cause problems. There was an interesting discussion recently showing how one can work with circular data without explicitly circular parameters: Results from playing with periodic models so maybe that can help… (I admit I don’t understand the method fully).

2 Likes

Thanks for the reply! Unfortunately I am still stuck ಠ_ಠ
I can estimate the shift with approaches like the one you suggested (I haven’t tried but I think I could do this). I thought maybe using circular distributions would be easier/proper, but like you say the circular parameters seem to be a little finicky. At his point this is more of an academic exercise/obsession to figure out why it doesn’t work. I took a bit of a rest from this problem, and will see if I can tackle this with a fresh mind again.

Check out the periodic GP’s I posted about in the thread @martinmodrak linked. For your case, I think you’d additionally want a GP on the phase through time.

I think the formal reason (paraphrasing @betanalpha as I don’t understand this myself very well) is:

Stan is built to sample in spaces isomorphic (topologically) to \mathbb{R}^N. There is no such isomorphism between unit circle and \mathbb{R}. Informally, the problem is that mappings between the unit circle and \mathbb{R} must introduce a “break”, so points with angle just above 0 relative to the break do not map close to points with angle just below 2\pi relative to the break.

This also explains why this version can sometimes work: if your posterior is negligible around the break, the break will not be encountered during sampling.

The linked approach (if I understand it correctly) avoids the problem by realizing that one doesn’t need to treat the angle as a standalone quantity. Instead, one can treat angle and amplitude together - basically by taking cartesian coordinates in \mathbb{R}^2 and converting to polar coordinates - which brings you into the type of space Stan can sample without issues.

Does that make sense?

1 Like

FYI, I’m still working on some tricks, but one thing I’ve found useful so far is getting rough estimates from the data (using, ex, fft) on the phase so the model’s prior can be centered on zero, keeping things away from the wrap-around areas. It also helps to initialize more narrowly around the phase MLE, but I’m still deciding on how much this biases subsequent inference.

1 Like

Stan is built to sample in spaces isomorphic (topologically) to \mathbb{R}^N . There is no such isomorphism between unit circle and \mathbb{R}. Informally, the problem is that mappings between the unit circle and \mathbb{R} must introduce a “break”, so points with angle just above 0 relative to the break do not map close to points with angle just below 2\pi relative to the break.

Yes this makes sense!

My original simulated data looked like this

n_cycle <- 8
df <- tibble(cycle=1:n_cycle) %>%
    mutate(y=map(cycle, ~rvonmises(100, 
                                 mu=circular(pi-2*pi/18*.x), 
                                 kappa=2*pi))) %>%
    unnest(y) 

original.pdf (7.0 KB)

I tried the same model with a smaller shift .

n_cycle <- 8
df <- tibble(cycle=1:n_cycle) %>%
    mutate(y=map(cycle, ~rvonmises(100, 
                                 mu=circular(pi-2*pi/60*.x), 
                                 kappa=2*pi))) %>%
    unnest(y) 

smaller shift.pdf (7.0 KB)

or with a bigger shift but fewer cycles

n_cycle <- 8
df <- tibble(cycle=1:n_cycle) %>%
    mutate(y=map(cycle, ~rvonmises(100, 
                                 mu=circular(pi-2*pi/18*.x), 
                                 kappa=2*pi))) %>%
    unnest(y) 

fewer cycles.pdf (5.8 KB)

so that at any point the distribution does not cross a break point. With this data the model works without issues, and the shift is estimated properly even without a prior for the shift parameter. Like you say the problem seems to arise at the break points, and I don’t see anyway around this problem.

But estimating things that cross this break point is exactly the point of using circular distributions, so I will give the GP a try. Heading over there @mike-lawrence.

1 Like

I am working through your models, but to be honest GPs are voodoo to me right now. I tried your non-periodic and periodic kernel GPs with my simulated data (without shift for now). I know this (making a points from a distribution) is a very round about way to create this data, but I thought It would be interesting to see if I can do this with GP.

# simulate circular data with shift 
##############################################################################################
n_cycle <- 4
df_dist <- tibble(cycle=1:n_cycle) %>%
    mutate(y=map(cycle, ~rvonmises(200, 
                                 # mu=circular(pi-2*pi/60*.x), 
                                 mu=circular(pi), 
                                 kappa=2*pi) %>% as.double())) %>%
    unnest(y) 


ggplot(df_dist, aes(x=y)) + geom_histogram() + facet_grid(cycle~.)

round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}

df <- df_dist %>% 
    mutate(x_cycle = round_any(y,.5)) %>%
    group_by(cycle, x_cycle) %>%
    summarise(y = n()) %>%
    complete(x_cycle=seq(0, 2*pi, .5), fill=list(y=0)) %>%
    ungroup() %>%
    mutate(x = x_cycle + 2*pi*(cycle-1))

ggplot(df) +
    geom_col(aes(x=x_cycle, y=y)) + facet_grid(cycle~.)


ggplot(df, aes(x=x, y=y)) + 
    geom_point()


# from stan forum 
# https://discourse.mc-stan.org/t/results-from-playing-with-periodic-models/18882

model_data = list(
    n = nrow(df), 
    y = df$y/100,
    x = df$x,
    min_diff_x = min(diff(df$x)),
    diff_range_x = diff(range(df$x))
)

I adjusted the amplitude manually for now. (Have you worked out a model that also estimates the amplitude? I couldn’t really follow the phamp part.). You mention that the fit was very similar with or without a periodic kernel, but in this case only the periodic kernel reproduces the periodic activity.

Hm, you seem to have shifted (excuse the pun) your representation of the data inputs to Stan between your first post and your latest post. It’s always best to try to work out how to express the model in a manner that addresses the data in it’s raw form rather than attempt to model the data after it’s been summarized/transformed.

Is the raw data of the kind that you’re observing the time of an event (thus typically summarized using histograms as in your first post), or is the raw data a set of time-value pairs as in the latest code you’ve posted?

Yes I hacked the data, transforming it to observations along time, and I was concerned about the appropriateness of doing this, and I am getting shifty (ಠ‿ಠ ) now that you mention this.

My raw data is actually a set of time-value pairs, to be more specific, the aggregate activity counts over a 1min. So in my raw data I have count values of activity (a motion sensor) for every 1min over a weeks.

The distribution of activity along time seems fairly large (at least a few hours) compared to the time resolution of my measurements (1min) and I assumed, without giving it much thought, that it could be taken as time of an event and thus summarise using a histogram or a distribution. I still think either representation would be ok but don’t really know how to proceed with GP.

p.s. I am analyzing the circadian activity of animals so, so my initial intuition was to try to model this data with a combination of circular distribution, as there is an obvious periodic trend in the data. Animals tend to have their activity restricted to some time of the day. Circular distributions seemed like a good place to start, but I am working now with data that shows a small shift in this periodic activity everyday, so that the activity peak shifts a little every day.

Can you post a plot of say a day of one animal’s raw data? I’m thinking that it might make sense to model the sum activity at each minute as a something like poisson or negative binomial, possible with zero-inflation, then model the influence of time (inc. the phase shift) as affecting the parameters of that distribution.

Do you have multiple individually-identifiable animals that each contribute multiple data points? If so, you might consider a hierarchical model to account for variability across animals.

Also, if the data are actually a contiguous time series, I wouldn’t carve it up into different days but instead keep it as one series and model a GP on the phase.

1 Like

Sure thing ! This is an example of the data

I’m thinking that it might make sense to model the sum activity at each minute as a something like poisson or negative binomial, possible with zero-inflation, then model the influence of time (inc. the phase shift) as affecting the parameters of that distribution.

Yes I was thinking to do something like this with a trend model, including a periodic component, like in the models that have seasonality. I think you are suggesting the same thing, but with GP for the underlying influence of time, when you say not carve the data into different days. I will give the trend model a try.

Do you have multiple individually-identifiable animals that each contribute multiple data points? If so, you might consider a hierarchical model to account for variability across animals.

Yes I have multiple animals and conditions, which I plan to incorporate into a hierarchical model once I have the basic model figured out, thanks for the suggestion!

1 Like

Just had a thought: yo probably don’t want to start off with zero-inflation, which is really for scenarios where, for example, you didn’t have the time component of each observed datum, so couldn’t “see” that the zeros are associated with whole periods with low rates.

Just wanted to toss in a couple of thoughts here:

First, because I haven’t seen it mentioned explicitly in this thread, it appears that your problem has to do with a multimodal posterior in the presence of the shift parameter. I don’t fully understand why this multimodality problem shows up (especially since the scale parameter is small enough that the Von Mises likelihood should be quite similar to an appropriate normal likelihood, which reduces to simple linear regression!). For whatever reason, the Von Mises likelihood seems to have deep troughs that isolate regions of parameter space that fit some (but not all) of the cycles. Simple tweaks to the model (like replacing the ‘shift’ parameter with a ‘stretch’ parameter that re-scales the period) don’t help. (For example, with a stretch parameter included via the transformed parameter y_trans = y*stretch + 2*pi()*(cycle - 1)*stretch;).

Second, in your toy data the peaks are very well identified, and the model is well-specified by construction. Under these conditions, you can use just the first cycle to get a preliminary estimate of mu, and the first two cycles to get a preliminary estimate of shift, then if you initialize mu and shift in the full model from the posteriors based on the first two cycles you will successfully avoid the undesirable modes. However, your mileage may vary on real, messy data (particularly if the shift isn’t constant in the “true” generative model). Note that you can avoid the need to pass the inits manually by declaring parameters mu and shift with offsets and multipliers corresponding to the mean and sd (or median and mad) of their posteriors from the initial analysis.

Third, the structure of your toy model presupposes that you can unambiguously identify what ‘cycle’ (i.e. period) each observation comes from. In the presence of an unknown shift parameter, it is not immediately obvious that you will always be able to identify which period an arbitrary observation corresponds to, particularly if the scale parameter is large. If the scale parameter is sufficiently small that there are clear, protracted breaks in each cycle with zero activity, then it might be palatable to treat the problem as a standard linear regression rather than with a circular distribution. (y[i] ~ normal(mu + shift*(cycle[i] - 1), 1/sqrt(kappa)), where mu is the intercept, shift is a slope, and cycle - 1 is a covariate). If there aren’t clear breaks with zero activity, then it’s unclear how you can unambiguously assign each data point to a cycle in the presence of an unknown shift, and you might want to consider how to account for that in the model (I’m not sure how this would work). Note also that in point 2 above, the idea for initialization relies on being able to unambiguously assign a cycle to the vast majority of observations that “truly” belong to cycles 1 and 2.

I tried to describe why this is an issue earlier in this thread: Von Mises data with a shift at every cycle