This is my approach after thinking about it and remembering that @jonah had something really clever in his pest control example in the tutorial in StanCon Helsinki (2018).
One problem might be that the time intervals are not evenly spaced. I’m not sure, but I think a continuous AR(1) process is actually a Ornstein-Uhlenbeck process. Also, you might want to check out the R ctsem
by @Charles_Driver. I’ve never used it before, but I guess it’s exactly for stuff like this.
Anyways, this code runs the model for one of the ID (in this example ID == 3
). You would have to extend it to more groups. Also, this code runs the AR(1) prior on the specific time points – since these are unevenly spaced, I also included two lines, which let’s you run the AR(1) process to model the correlation between days (it also models missing days automatically)…
library(tidyverse)
library(readr)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
data <- read_csv("data1.csv",
col_types = cols(X1 = col_skip())) %>%
arrange(utc) %>%
filter(ID == 3) %>%
mutate(#day = lubridate::floor_date(utc, unit = "day"), # un-comment if AR(1) by days
#time = as.numeric(difftime(day, min(day), units = "days")) + 1) # un-comment if AR(1) by days
time = 1:n()) # comment out if AR(1) by days
stan_data <- list()
stan_data <- within(stan_data,{
N <- nrow(data)
X <- model.matrix(~ 0 + var1 + var2 + var3 + var4 + var5, data = data)
K <- ncol(X)
t <- max(data$time)
y <- data$y
time_idx <- data$time
})
posterior <- stan(file = "stan_model.stan", data = stan_data)
print(posterior)
Where stan_model.stan
is:
data{
int<lower=0> N;
int<lower=0> K;
int y[N];
matrix[N,K] X;
int t;
int<lower=1,upper=t> time_idx[N];
}
transformed data{
matrix[N,K] Q = qr_thin_Q(X)*sqrt(N);
matrix[K,K] R_inv = inverse(qr_thin_R(X)/sqrt(N));
}
parameters{
real alpha;
vector[K] beta_tilde;
real<lower=0> sigma_time;
real<lower=0,upper=1> rho_raw;
vector[t] time_raw;
}
transformed parameters{
vector[N] mu = alpha + Q*beta_tilde;
// AR(1) process priors
// (inspired by Jonah Gabry's Helsinki Stan tutorial)
real rho = 2.0 * rho_raw - 1.0;
vector[t] time = sigma_time * time_raw;
time[1] /= sqrt(1 - rho^2);
for (i in 2:t) {
time[i] += rho * time[i-1];
}
mu += time[time_idx];
}
model{
y ~ bernoulli_logit(mu);
alpha ~ normal(-5, 5);
beta_tilde ~ normal(0, 2.5);
rho_raw ~ beta(10, 5);
time_raw ~ std_normal();
sigma_time ~ std_normal();
}
generated quantities{
vector[K] beta = R_inv*beta_tilde;
}
Model for time points:
First is just the time
variable from the model code, second is inv_logit(mu)
.
Model for days:
First is just the time
variable from the model code, second is inv_logit(mu)
.
If all of this makes sense, I don’t know. But maybe this’ll give you a place to start.