Time of evening predictor in a categorical model

I’m working on a categorical model (y = (1, 2, 3, 4)) where each category represents a distinct stage of sleep for a 5 minute period. A key predictor of sleep stage is the time of evening, where deep sleep might be more prevalent before midnight, and REM sleep more prevalent in the early hours of the morning.

Time of evening (x) is modelled using two trigonometric functions
\mu_1 \propto \beta_1 \sin(2 \pi x / 24) + \beta_2 \cos(2 \pi x / 24),
for each state (1, 2, 3). The 4th state is derived from the remaining 3 using the softmax function. Trigonometric functions allow hours in the evening to be represented as close to each other, without a discontinuity at midnight.

As per the standard advice, I simulated some data to understand how to best fit and recover trigonometric functions that predict a categorical outcome.

library(tidyverse)
library(cmdstanr)
library(posterior)
library(lubridate)
library(tidybayes)
library(scales)
library(bayesplot)

options(mc.cores = parallel::detectCores())

time_frame <- data.frame(timestamp_of_evening = seq(as_datetime("2022-04-01 20:00:00"), as_datetime("2022-04-02 06:00:00"), by = "min")) %>%
  mutate(time_of_evening = hour(timestamp_of_evening) + minute(timestamp_of_evening)/60)

amplitude_1_sin <- -1.0
amplitude_1_cos <- -1.0

amplitude_2_sin <- 1.0
amplitude_2_cos <- 1.0

amplitude_3_sin <- 0.25
amplitude_3_cos <- 0.25

f_1 <- amplitude_1_sin*sin(2*pi*time_frame$time_of_evening/24) + amplitude_1_cos*cos(2*pi*time_frame$time_of_evening/24)
f_2 <- amplitude_2_sin*sin(2*pi*time_frame$time_of_evening/24) + amplitude_2_cos*cos(2*pi*time_frame$time_of_evening/24)
f_3 <- amplitude_3_sin*sin(2*pi*time_frame$time_of_evening/24) + amplitude_3_cos*cos(2*pi*time_frame$time_of_evening/24)

log_sum_exp <- log(exp(f_1) + exp(f_2) + exp(f_3))

log_softmax_1 <- f_1 - log_sum_exp
log_softmax_2 <- f_2 - log_sum_exp
log_softmax_3 <- f_3 - log_sum_exp
log_softmax_4 <- 0 - log_sum_exp

p_1 <- exp(log_softmax_1)
p_2 <- exp(log_softmax_2)
p_3 <- exp(log_softmax_3)
p_4 <- exp(log_softmax_4)

y <- c()
for (i in 1:nrow(time_frame)){
  y[i] <- sample(c(1, 2, 3, 4), size = 1, replace = TRUE, prob = c(p_1[i], p_2[i], p_3[i], p_4[i]))
}

I’ve fit the simulated data with the following stan model

data {
  int<lower=1> K; // number of states
  int<lower=1> N;
  array[N] int y;
  vector[N] time_of_evening;
}
parameters {
 array[2] vector[K-1] time_of_evening_trig;
 corr_matrix[K-1] time_of_evening_corr;
 vector<lower=0>[K-1] time_of_evening_scale; 
 }
transformed parameters{
  vector[N] mu1;
  vector[N] mu2;
  vector[N] mu3;
  array[N] vector[K] mu;
  
  for (n in 2:N){
    mu1[n] = time_of_evening_trig[1,1]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,1]*cos(2*pi()*time_of_evening[n]/24);
    mu2[n] = time_of_evening_trig[1,2]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,2]*cos(2*pi()*time_of_evening[n]/24);
    mu3[n] = time_of_evening_trig[1,3]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,3]*cos(2*pi()*time_of_evening[n]/24);
    mu[n] = transpose([mu1[n], mu2[n], mu3[n], 0]);
  }
}
model {
  time_of_evening_scale ~ cauchy(0, 2.5);
  time_of_evening_corr ~ lkj_corr(2);
  time_of_evening_trig ~ multi_normal(rep_vector(0, K-1), quad_form_diag(time_of_evening_corr, time_of_evening_scale));

  for (n in 2:N)
      y[n] ~ categorical_logit(mu[n]);
}
datalist <- list(K = 4, N = nrow(time_frame), y = y, time_of_evening = time_frame$time_of_evening)
mod <- cmdstan_model("sin_model.stan")
fit <- mod$sample(data = datalist, chains = 4, iter_warmup = 1000, iter_sampling = 1000, adapt_delta = 0.99)

Even after setting adapt_delta = 0.99 I still get some divergent transitions. I’m not sure how to articulate this point clearly, but I suspect that the periodicity in one state ends up being correlated with the periodicity in another state. I’ve experimented with a multivariate prior, but as I don’t understand the issue well I’m not sure that this has helped.

The issue is that the model doesn’t quite fit the generated data - the shape of the curves are close to what is simulated, but there is a small amount of bias. In particular, the estimates for the parameters relating to the probability of being in state 1 are biased away from the true value (vertical dashed line), which results in a significant departure of the predicted probability of being in state 1 early in the evening, when compared to the simulated curve.


When I sample from the prior distribution only, the sampler performs much more poorly. I wondered at this point if that is because having a periodic predictor per state over-states the amount of information at hand. We know that if the chances of being in state 1 is higher earlier in the night, then we already know that the chances of being in states 2, 3 or 4 is much lower, without having to use time of evening as a predictor.


From the above, I then tried this model

transformed parameters{
  vector[N] mu1;
  vector[N] mu2;
  vector[N] mu3;
  array[N] vector[K] mu;
  
  for (n in 2:N){
    mu1[n] = time_of_evening_trig[1,1]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,1]*cos(2*pi()*time_of_evening[n]/24);
    mu2[n] = time_of_evening_trig[1,2]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,2]*cos(2*pi()*time_of_evening[n]/24);
    mu3[n] = time_of_evening_trig[1,3]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,3]*cos(2*pi()*time_of_evening[n]/24);
    mu[n] = transpose([mu1[n], 0, 0, 0]);
  }
}
model {
  time_of_evening_scale ~ cauchy(0, 2.5);
  time_of_evening_corr ~ lkj_corr(2);
  time_of_evening_trig ~ multi_normal(rep_vector(0, K-1), quad_form_diag(time_of_evening_corr, time_of_evening_scale));

  for (n in 2:N)
      y[n] ~ categorical_logit(mu[n]);
}

Where only the first state has a time of evening predictor. The resulting posterior predictions are shown below, where the fit for state 1 is still biased, and the fit for the other states is well off.

It seems to me that the way this model is parameterised is causing an issue with sampling. I’m not sure how to reparameterise it (or maybe use a different representation for time of evening will help), or if my diagnosis of the problem is correct. Any help would be much appreciated. Thanks

Quick update - if I take out the predictor for the third state, and use univariate priors the model doesn’t seem to have any sampling issues. The fit is as good as using a predictor for three states, but still slightly biased.

transformed parameters{
  vector[N] mu1;
  vector[N] mu2;
  vector[N] mu3;
  array[N] vector[K] mu;
  
  for (n in 2:N){
    mu1[n] = time_of_evening_trig[1,1]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,1]*cos(2*pi()*time_of_evening[n]/24);
    mu2[n] = time_of_evening_trig[1,2]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,2]*cos(2*pi()*time_of_evening[n]/24);
    //mu3[n] = time_of_evening_trig[1,3]*sin(2*pi()*time_of_evening[n]/24) + time_of_evening_trig[2,3]*cos(2*pi()*time_of_evening[n]/24);
    mu[n] = transpose([mu1[n], mu2[n], 0, 0]);
  }
}
model {
  time_of_evening_trig[1] ~ std_normal();
  time_of_evening_trig[2] ~ std_normal();

  for (n in 2:N)
      y[n] ~ categorical_logit(mu[n]);
}